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Summary 


The Telecommunications, Navigation, and Information Management (TNIM) system will be a major el- 
ement in the Space Exploration Initiative (SEI). Certain time-critical TNIM functions will have to be 
handled at Mars to avoid the 8 to 40 minute Earth reaction time due to the round trip transmission de- 
lay. With a limited number of astronauts at Mars, it will be imperative that those time-critical functions 
executed at Mars require little or no input from the astronauts. 

TNIM functions during the Apollo era did not suffer from Earth-centered control delays because the 
round trip time was only a matter of seconds between the Earth and Moon. NASA communications 
networks during the Apollo era and up to the present time have been very manpower intensive and there- 
fore have not run unattended. Hence the need exists to develop unattended network elements to support 
SEI communications. The alternative would be an awkward, non-responsive TNIM system for Mars 
exploration which could jeopardize the health and safety of the Mars astronauts. 

The purpose of this study was to explore how a Mars-centered communications network could be 
designed to run in a virtually unattended mode. The proposed Mars network architecture was examined 
and enhancements to the architecture were suggested. These enhancements, if added, will improve the 
fault tolerance of the network via improved connectivity between all nodes. Enhancements suggested 
included: 

• Addition of a third Mars Relay Satellite (MRS), 

• Incorporation of crosslink capabilities between MRS’s, 

• Addition of a Mars Communications Hub, and 

• Addition of a Mars Polar Relay Satellite. 

Mars network management functions were grouped into one of six categories; Fault Management, 
Configuration Management, Accounting Management, Performance Management, Resource Manage- 
ment, and Security Management. Technology assessments were then performed in each of these areas 
to determine what technologies might be needed to limit the astronauts’ role at Mars in the execution of 
these functions. For completeness, it was assumed that all functional areas would be automated at Mars, 
although in actuality only those critical functions that cannot afford a 40 minute delay will have to be au- 
tomated. The remaining non-realtime functions could be performed by network operations personnel at 
Earth. Technologies studied included both conventional and emerging Distributed Artificial Intelligence 
(DAI) network management technologies as well as satellite fault tolerance technologies. 

Finally, a technology development plan was prepared. It was recognized that a network testbedding 
effort should be initiated to define and test different networking standards, technologies, and approaches; 
and to determine which might be the most practical and effective for implementation in a Mars network. 
Such an effort would identify those technologies with promising attributes so that they could be further 
developed to meet the specific needs of a Mars TNIM system. 
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Chapter 1 

Introduction 


This chapter is organized as follows: 

1.1 Background on Contract 

1.2 Statement of Work 

1.3 Organization of Report 

1.1 Background on this Contract 

Loral is providing technical support to NASA Lewis 
Research Center on a task order basis in the general area 
of defining advanced satellite system concepts, under 
NASA Contract No. NAS3-25092, Advanced Satellite 
Systems Concepts (ASSC). This report gives the results 
of the third Task Order, entitled Unattended Network 
Operations. 

The general scope and objectives of the Task Order 
Contract are as follows. Over the next four years (fis- 
cal years 1989-1992), NASA will be evaluating sev- 
eral new advanced satellite system concepts as potential 
new experimental satellite programs. These are in re- 
sponse to new NASA mission needs as well as respond- 
ing to specific recommendations of the NASA Advisory 
Council. These new concepts include: 

• Data Distribution Satellite 

• Wideband Point-to-Point Communications 

• Intersatellite Communications 

• Small Terminal Communications 

Loral is required to provide personnel and other re- 
sources as needed for technical support to NASA for 
the purpose of aiding NASA in the formulation, evalua- 
tion, and advocacy of certain advanced communication 
satellite applications. Analyses will be performed and 
results provided as required by NASA for the purpose 


of explaining and justifying potential future advanced 
satellite technology development and flight programs. 

To accomplish these objectives, Loral will perform 
specific tasks that are defined through the issuance of 
Task Orders to perform any of the following: 

i. Definition of preliminary concepts. 

ii. Sensitivity analyses. 

iii. Identification of critical technologies. 

iv. Formulation of preliminary technology plans. 

v. Preparation of written reports, oral reports, and 
graphic presentation materials. 

1.2 Statement of Work 

This section gives the Statement of Work for the Unat- 
tended Network Operations Study which is the subject 
of this Final Report. Background is given together with 
the Scope of Work which is divided into four subtasks. 

1.2.1 Background 

One major element needed to accomplish the ambi- 
tious goals set forth in the Space Exploration Initiative 
(SEI) is the Telecommunications, Navigation, and In- 
formation Management (TNIM) system. In the past, pi- 
loted mission operations, telecommunications, and nav- 
igation functions have operated with Earth-based, pri- 
marily manual and highly personnel-intensive systems. 
While direct extensions of these existing system archi- 
tectures may suffice for the lunar TNIM system, such an 
Earth-centered manpower intensive system would re- 
sult in a complex, awkward, and non-responsive TNIM 
system for Martian exploration phase of the SEI. This 
is primarily due to the long transmission time between 
Mars and Earth which can be 20 minutes one way. 


PRECEDING PAGE BLANK NOT FILMED 


1 - 1 



1-2 


CHAPTER 1. INTRODUCTION 


However, due to the extreme distances and travel 
times involved, manpower in the Martian environment 
will be scarce, costly and would probably better serve 
mission requirements by performing tasks other than 
extensive operations of the TNIM system. It is there- 
fore necessary to develop a highly unattended network 
control mode for mission operations functions. Thus, 
to enable a responsive Martian TNIM system, in situ 
adaptive monitor and control TNIM capabilities must 
be provided. 

Since all of our current experience is with attended 
space networks, in order to avoid a revolutionary 
change from the lunar to Martian exploration operations 
environment, it makes sense to evolve this capability 
utilizing the Moon as a realistic operational test bed. 
Timing envisioned for the lunar and Mars exploration 
programs will allow the lunar setting to be utilized in 
developing and implementing this capability for use on 
the Mars missions. 

The network monitor and control functions are used 
to: 

i. Plan and schedule network usage; 

ii. Provide real time adaptive network control which 
can accommodate reasonable faults; 

iii. Multiplex and packetize the data at a system entry 
point; 

iv. Reconstruct the original data streams at a network 
mission delivery point; 

v. Extract network configuration and status informa- 
tion as part of the data reconstruction process at 
each delivery point; and 

vi. Determine radiometric navigation information 
form tracking data. 

Once again, because of the Moon’s proximity to the 
Earth, it is feasible to conduct an exclusively Earth- 
based network monitor and control function for lunar 
exploration. However, due to the long transmission 
time resulting from the extreme distance between Earth 
and Mars, an Earth-centered Martian monitor and con- 
trol system could not respond to any short term (i. e., 
less than 40 minutes) mission operation needs. Con- 
sequently, evolutionary approaches to develop the re- 
quired capability of infrequendy attended network op- 
erations starting with lunar exploration is to be exam- 
ined by this study. 


Mars tracking and data acquisition functions are sim- 
ilar to those in the lunar architecture, but with three sig- 
nificant differences: 

1. The round trip transmission time is up to 40 min- 
utes compared to several seconds for the Moon. 

2. The Martian rotational period is a little over 24 
hours which causes facilities on the Martian sur- 
face to lose direct connectivity with the Earth for 
10 to 12 hours per day. This may require relay 
satellites placed around Mars. In contrast, relay 
satellites for lunar exploration may only be re- 
quired when accessing lunar far side regions. 

3. The large telecommunications distance from Earth 
to Mars causes all Mars-to-Earth links to be 
severely power-limited (resulting in data rate re- 
strictions). In contrast, achievable lunar power 
levels could telemeter as much data as Earth-based 
users can utilize. 

Due to these factors, Mars telecommunications sys- 
tem must utilize Martian relay satellites, operate primar- 
ily at Ka-band (or higher) frequencies, and have tele- 
communications terminals that operate in infrequendy 
attended modes. The lunar system may initially start 
operations as an Earth-based attended mode, and evolve 
to an infrequendy attended system which has a fail-soft 
capability of manual operations from Earth. 

1.2.2 Scope of Work 

The purpose of this study is: 

a. To explore aspects of network operations for the 
SEI Mars manned missions that can or should be 
automated to achieve infrequendy attended net- 
work operations. 

b. To assess what technology and advanced engineer- 
ing development is required to accomplish this 
goal, and 

c. To determine requirements which must be in- 
cluded in the initial lunar SEI TNIM system so 
that the lunar system can serve as a development 
and test bed facility for evolution of Martian ex- 
ploration TNIM. 
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122.1 Task 1: Identification of Functions for 
Unattended Operations 

The purpose of this task is to investigate the various as- 
pects of mission operations network control to deter- 
mine which network operations tasks need to be auto- 
mated to achieve largely unattended operations for the 
Mars manned missions of the Space Exploration Ini- 
tiative (SEI). This would involve consideration of such 
things as operations of satellite master control centers, 
NASA mission control operations for manned missions, 
operations for the TDRS system, and other existing net- 
work control facilities which might be applicable to a 
Mars manned mission network control system. 

Network control functions may include such things 
as planning, scheduling, and controlling the telecom- 
munications navigation, and information management 
(TNIM) systems, network fault detection and isolation, 
fault correction which can include dynamic reconfigura- 
tion, and other network operation and control functions. 
This investigation is not bound to current examples and 
should include new and innovative ideas in the area of 
mission operations and network control. 


1222 Task 2: Technology Assessment 

Once network operation functions with the potential or 
need to be automated have been identified, an assess- 
ment of alternative innovative system concepts which 
might affect the technology level needed to implement 
these functions is to be conducted. Promising systems 
concepts, those that suggest reasonable implementation 
and operations approaches, are to be examined to deter- 
mine the technology and advanced engineering impli- 
cations of these approaches. 

This technology assessment for infrequently attended 
network operations in the Mars manned mission is to 
include an examination of those functions identified in 
Task 1 of this study and a determination of the technol- 
ogy development required, if any, to advance each of 
these functions to the appropriate level of development 
required for such an unattended network. Such technol- 
ogy development may target hardware, software, and 
operations approach and philosophy; or a combination 
of these areas. This assessment would take into account 
such things as the necessity and the feasibility of au- 
tomating functions as well as the time frame in which 
this unattended network is required. 


1.2.2.3 Thsk 3: Technology Development Plan 

The purpose of this task is to provide an estimate of the 
schedule and funding required to develop the necessary 
technologies and advanced engineering for use in the 
SEI Mars manned missions. It may be assumed that the 
lunar TNIM system could be used as a test bed for these 
technologies and for the concept of unattended network 
operations in general. 

The schedule and funding estimates should be done 
on a technology by technology basis. If a given technol- 
ogy is common to more than one area or function, this 
should be taken into account when developing schedule 
and required funding estimates. Schedules (i. e., time 
lines) for technology development should be given on 
a generic basis rather than assuming a specific starting 
date for such development. Similarly, funding estimates 
should be given in constant 1991 dollars without taking 
into account inflation and should be given on a yearly 
basis. 

1.2.2.4 Task 4: Reporting 

The expected product from this study is a final report 
which documents: 

1 . Network operations functions that can or should be 
automated; 

2. Requirements that should be included in the lunar 
TNIM designs to make that system appropriate to 
be used as a Mars TNIM test bed; 

3. Explicit identification of innovative ideas recom- 
mended; and 

4. Technology assessment of these functions to pro- 
vide infrequently attended network operations. 

The total reporting requirements are as follows: 

• Concise monthly progress reports to the technical 
manager, 

• Formal presentation at NASA Lewis Research 
Center upon completion of this study summarizing 
the methodology and results of this study; 

• Final report as described above. 
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Table 1-1: Organization of Report 


Chapter 

Contents 

1 . 

Introduction 

2. 

Executive Summary 

3. 

Baseline Mars TNIM Network Architecture 

4. 

Overview of TNIM Functions to be Automated 

5. 

Assessment of Network Management Technologies 

6. 

Technology Development Plan 


1.3 Organization of Report 

Table 1-1 gives the organization of this Final Report by 
chapter. Chapter 2 presents an executive summary of 
the work. 

Chapter 3 presents some of the background informa- 
tion that was consolidated and used by the study group 
over the course of the study. This information helped 
form a baseline architecture for the Mars network that 
could be referred to as the study progressed. It has been 
included in this final report for the reader to reference, 
as well as to present some new thoughts on how the net- 
work architecture could be improved. 

Chapter 4 describes a standard set of Telecommu- 
nications, Navigation, and Information Management 
(TNIM) network control functions as required by the 
SOW Subtask 1, “Identification of Functions for Unat- 
tended Operations” . Six generic network control func- 
tions that would have to be automated in an unattended 
network operation are described. 

Chapter 5 discusses technologies that might be fur- 
ther developed to automate the Mars TNIM functions 
identified in Chapter 4. This is in response to the SOW 
Subtask 2, ‘Technology Assessment”. The chapter fo- 
cuses on two technology areas: 

i. Artificial Intelligence technologies needed for tak- 
ing man out of the loop in controlling a network; 

ii. Fault tolerant technologies inherent in the design 
of an unattended network. 

Chapter 6 presents a suggested technology develop- 
ment plan. Ideas are given on how an Earth-based 
testbed for an unattended communications network 
might be constructed. This testbed could be used to 
evaluate promising unattended network technologies 
and concepts that could be integrated into future lunar 
and Mars communications networks. 



Chapter 2 

Executive Summary 


This chapter gives an executive summary of the 
Unattended Network Operations Technology Assess- 
ment Study. Mars network monitor and control func- 
tions to be automated are described first, followed by an 
assessment of the supporting technologies required for 
unattended operation of the network. Then a suggested 
testbed development plan is discussed, and conclusions 
and recommendations for next steps are presented. 

2.1 TNIM Functions to be Automated 

2.2 Assessment of Unattended Network Technologies 

2.3 Technology Development Plan 

2.4 Conclusions & Recommendations 

2.1 TNIM Functions to be Automated 

This section, which is a summary of the material in 
Chapter 4, addresses the Telecommunications, Navi- 
gation, and Information Management (TNIM) network 
monitor and control functions required to support net- 
work communications and navigation in the Mars envi- 
ronment. 

2.1.1 Scope and Methodology 

The Mars TNIM architecture is not fixed at the present 
time. The study only assumes that the network may be 
composed of some combination of the architectural el- 
ements discussed in Chapter 3. 

Network monitor and control functions covered in 
this section are common to all network nodes and are ap- 
plicable to most all network architectures. These func- 
tions are used to monitor and control the TNIM network 
across which telecommunications and navigation infor- 
mation will flow. 


The TNIM communications system can be modeled 
as a collection of sites, each of which represents a dis- 
tinct distributed location where some type of commu- 
nications equipment resides. These sites are connected 
via the Mars Relay Satellites (MRS’s) located in Mars- 
synchronous orbits. 

The goal of network management is to maintain user- 
to-user service under changing traffic conditions, user 
requirements, and system interruptions. The best use of 
the resources available to detect and diagnose faults or 
service degradation must be applied to maintain service 
to the user. The following is a description of network 
monitor and control functions needed to meet these ob- 
jectives. 

2.1.2 Description of Mars TNIM Monitor and 
Control Functions 

Integrated Network Management (INM) functions fall 
into one of six main categories as defined by the Inter- 
national Standards Organization (ISO): 

1. Fault management 

2. Configuration management 

3. Accounting management 

4. Performance management 

5. Resource management 

6. Security management 

2.1.2.1 Fault Management Functions 

A comprehensive set of functions for fault handling 
must be provided. These functions include detecting, 
diagnosing, and recovering from network faults. Since 
much of the TNIM equipment will be difficult to access, 
fault handling functions must emphasize early predic- 
tion and prevention of faults or automated switch-over 
to redundant systems. 
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2.12.2 Configuration Management Functions 

Configuration management functions include defin- 
ing, changing, monitoring, and controlling network re- 
sources and data. 

2.1.2.3 Accounting Management Functions 

In order to support the TNIM planning and schedul- 
ing functions, the status and availability of network re- 
sources must be monitored and maintained. Resource 
availabilities may vary due to planned outages such as 
the MRS’s being out of Earth view, or due to unplanned 
events such as equipment failures. 

2.1.2.4 Performance Management Functions 

Performance management functions include monitoring 
both the current and long term performance of the net- 
work. Parameters to be monitored include effective link 
data rates, link data quality, time taken for link acquisi- 
tion, and link down times. 

2.1.2.5 Resource Management Functions 

Once the high level priorities and policies are defined 
and the major events planned, TNIM network resources 
need to be assigned to support them. This will result in 
a high level event schedule and general resource allo- 
cation profile. For example, a general resource alloca- 
tion might allow Mars inhabitants 4 hours per week on 
a video link to Earth. 

2.1.2.6 Security Management Functions 

Security Management Functions ensure authorized ac- 
cess to the network resources and user information. 

2.2 Assessment of Unattended Network 
Technologies 

This section summarizes the work presented in Chap- 
ter 5. After reviewing current NASA space-based net- 
work approaches such as DSN, TDRS, and NASCOM, 
it has become evident that little has been done to reduce 
manpower levels in the operations of space-based net- 
works. 

•Two technology areas were investigated for unat- 
tended networks: 


1. Network Management Technologies 

2. Fault Tolerant Technologies 

The first group of technologies were those needed to 
automate network monitor and control functions. These 
technologies are required to take man out of the control 
and decision making loop for routine Mars network op- 
erations. As mentioned above, current space commu- 
nication networks in use by NASA are very manpower 
intensive in virtually all functional areas (scheduling, 
acquisition, problem resolution, . . . ). However, com- 
mercial network control systems such as telephone and 
information systems have seen considerable automation 
of the processes and functions involved. Some of the 
supporting technologies and approaches could be ap- 
plied to space-based networks. Even commercial net- 
works do not run fully unattended - man is still in the 
loop to handle faults. 

A second technological area, fault prevention and 
fault tolerance technologies, was investigated. Preven- 
tion of network faults is highly desirable in any design, 
but it becomes especially important in an unattended 
network since the prevention of faults implies preven- 
tion of performing the complex decision making and re- 
pair efforts that accompany recovery from faults. Much 
of the decision making will have to be made locally at 
Mars. Fault tolerance includes fault detection and cor- 
rection. Inclusion of certain fault tolerant technologies 
in the system design will allow graceful degradation 
with time as opposed to sudden and catastrophic fail- 
ures. The Mars system may have to last 10 to 15 years 
and quite possibly much longer, at least for the long term 
missions. 

After we are able to develop fully unattended net- 
works on Earth, we can apply the technologies devel- 
oped to Space Exploration Initiative (SEI) networks 
where new difficulties arise due to longer distances and 
higher data rates. 

2.2.1 Assessment of Network Management 
Technologies 

The major problem in implementing a distributed com- 
munication network system is how to take action at a 
temporal and spatial distance. For example, when a 
communications link fails, new routing strategies must 
be coordinated not only at the nodes involved, but also 
at additional nodes involved in the alternative routing 
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strategy. The problem solver must reason about and co- 
ordinate the remote effects of local decisions and rea- 
soning with indeterminate knowledge. 

The problems are exacerbated as the scale of the sys- 
tem increases. For example, it may not be possible for 
a node to immediately determine the status of another 
node or activate it in time to ensure that network connec- 
tivity is maintained. Even current world-wide commu- 
nications systems (e. g. NASCOM) have serious prob- 
lems with synchronizing and coordinating communica- 
tions services. These problems can only be expected to 
increase as we attempt to expand the communications to 
include lunar and Mars-based communications nodes. 


Protocol Approaches for Managing Network Re- 
sources 

The use of protocols have been widely accepted as an 
effective means for managing local area and wide area 
networks in multi-vendor environments. These proto- 
cols include: 

• Common Management Information Protocol 
(CMIP), 

• Common Management Protocol over TCP/IP 
(CMOT), and 

• Simple Network Management Protocol (SNMP). 

The most widely accepted and used protocol is the 
SNMP. 

There are two components to the implementation of 
SNMP; an agent and a network management station. 
The agent is software found on a variety of network el- 
ements (bridges, routers, file servers, etc.). The agent 
collects network statistics for the element on which 
it resides. The agent forwards the information when 
requested by a network management station or when 
an event occurs. Other network monitor and control 
tasks include evaluating the status of various network 
elements, reviewing error situations, and dynamically 
rerouting network traffic around network nodes which 
are heavily loaded. Due to its message passing and dis- 
tributed topology, the SNMP has a design framework 
which lends itself to implementing a Distributed Artifi- 
cial Intelligence (DAI) system. 


2.2.1. 1 The Role of Distributed Artificial Intelli- 
gence 

Distributed Artificial Intelligence (DAI) is a branch of 
the AI discipline dealing with the cooperative solution 
of problems by a distributed and decentralized group of 
agents. The agents can be simple or complex processing 
elements. 

DAI deals with problems that develop when a group 
of loosely-coupled problem-solving agents cooperate to 
solve a problem. In such a problem-solving environ- 
ment, each of the agents has a limited amount of knowl- 
edge of the problem and can only gain this knowledge 
through communication and coordination with other 
agents. DAI falls into two categories: 

i. Distributed Problem Solving (DPS) 

ii. Multi-agent systems 

By combining conventional software problem solv- 
ing techniques with emerging AI technologies, DAI is 
able to solve complex problems in a distributed envi- 
ronment. In a telecommunication network, by applying 
a framework similar to the SNMP, the status of each 
element or node on the network can be monitored and 
any faults can be detected with conventional software 
or hardware technologies. Judicious selection of AI and 
conventional methodologies can be employed to man- 
age the faults. 

Modeling and simulation techniques can be em- 
ployed to monitor the performance of the processing 
elements within the network. The performance infor- 
mation will help in identifying the constraints within the 
network. The information about the network constraints 
can then be used for managing the resources within the 
network. 

The following are some of the benefits of Distributed 
Artificial Intelligence: 

• Inherent parallelism in the approach speeds up 
computations and problem solving. 

• Reliability and survivability is improved through 
redundancy. 

• Helps to achieve increased modularity and recon- 
figurability. 

• Accommodates open systems (systems with no 
complete representation and with dynamically 
changing boundaries). 
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• Adaptability: concurrent systems are inherendy 
more adaptable than sequential systems. 

• Multiple perspectives: different problem solvers 
can bring several perspectives to bear on one prob- 
lem. 

2 .2.1.2 Fault Management Technologies 

AI technologies such as diagnostic expert systems and 
artificial neural systems can be used for detecting, di- 
agnosing, and correcting network faults. There are cur- 
rendy many excellent off-the-shelf software packages 
that provide adequate functionality to support develop- 
ment of robust AI fault management systems. 

For simple subsystems, rule-based systems with pre- 
dictably robust performance are easy to develop, but 
complex systems often require more powerful strate- 
gies such as model-based reasoning paradigms. Model- 
based expert systems use an internally defined model to 
reason about the system, both to trace causal relation- 
ships and explore possible corrections strategies. Hy- 
brid approaches often prove to be the most workable 
and effective systems. 

Artificial neural systems provide excellent means 
Geaming paradigms and models) for detecting errors 
and classifying errors (diagnosing). Their principal 
benefits are the ability to provide robust response over 
widely varying conditions, the ease of programming 
(training) them to handle new situations, and their ca- 
pability to provide real-time response once trained. 

A combination of the use of artificial neural systems 
and expert systems is ideal for fault management. Re- 
covery from network faults can be represented as a set 
of rules which will operate on the results from the Arti- 
ficial Neural System (ANS) and model-based reasoning 
systems. 

2.2.1.3 Configuration Management Technologies 

Configuration management is a prerequisite for effec- 
tive application of the other elements of network man- 
agement. The data base of all network components 
(e. g., hardware, software, circuits or lines) must be 
made available to help in scheduling and tracking of 
changes to the configuration. 

Techniques for Configuration Management. One 
approach using a hybrid rule/frame-based methodology 
can apply a set of rules that specify what a complete or 


legal solution must include when presented with a set of 
initial choices from among a set of options, with impli- 
cations or constraints. It must also apply some conflict 
resolution rules to arrive at a legal solution. 

A machine learning technique can be developed and 
trained with a certain set of examples to recognize pat- 
terns and thereby learn how to configure the network 
when certain patterns appear in the system. Such an 
adaptive system is better than a rule based system which 
will demand that new rules be developed when new sit- 
uations which have not been represented in the rule base 
occur. Also, there may be too many rules to write. For 
an unattended network system, an adaptive system is 
highly recommended. 

Case-based reasoning systems can be applied in con- 
figuring a communication network system. If a knowl- 
edge of possible cases exists, then by using a library of 
past cases the system can reconfigure the network au- 
tonomously. 

The monitoring part of configuration management 
can be performed by: 

i. Backward-chaining, rule-based diagnostic expert 
system which runs repeatedly until a particular rec- 
ommendation is encountered and then alarms ap- 
propriate embedded system, or 

ii. Forward chaining system that monitors data pat- 
terns and informs the controlling function when a 
particular pattern is encountered. 

2.2.1.4 Accounting Management Technologies 

Data being generated at different nodes of the network 
will need to be buffered and retrieved in an efficient 
manner and in a form understood by a client agent. It is 
necessary to investigate efficient data acquisition, stor- 
age and retrieval technologies for managing the infor- 
mation. A combination of a conventional approach to 
data management and object-oriented database manage- 
ment technologies should be investigated to handle data 
generation at each node and data distribution to request- 
ing agents within the network. 

2.2.1.5 Performance Management Technologies 

Performance management technologies measure and 
analyze resource utilization and network response time. 
They provide engineering traffic statistics to aid in pre- 
dictive network performance. 
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Recommended solution approaches may include con- 
ventional discrete event simulation technologies and 
knowledge-based simulation techniques. Application 
of object-oriented knowledge representation schemes 
will aid system modularity and extensibility. 

2.2.1.6 Security Management Technologies 

Technologies that guarantee the correct coordination of 
the agents (software) are required for preventing incor- 
rect activation and execution of other agents. These 
technologies are necessary to ensure a secure and re- 
liable system. 

2.2.1.7 Resource Management Technologies 

For small and simple tasks, forward chaining rule-based 
expert systems or operations research techniques (linear 
programming) can be used. For mid-size or large and 
complex resource management tasks, the use of hybrid 
tools (object oriented and rule -based reasoning) are nec- 
essary. 

2.2.2 Assessment of Fault Tolerant Technolo- 
gies 

A key issue which was identified in this study was the 
extent to which the network should be fault tolerant. 
The TNIM system must provide significant levels of 
fault tolerance both at the system subsystem and com- 
ponent levels to achieve its mission. This is driven by a 
number of considerations: 

• Minimal availability of manpower and resources 
at Mars or in Mars orbit to repair malfunctions. 

• Limited capability for remote diagnosis due to the 
difficulties of Mars-Earth communications (time 
delay, pointing, bandwidth). 

• Critical support requirements during maneuvers 
such as aerobraking and launch. 

• Long mission life including both transit from Earth 
to Mars and on-site and on-orbit at Mars. 

• Need for man-rated reliability during the later 
phases of the Space Exploration Initiative (SEI). 

Most importantly, the long journey time between 
Earth and Mars and the remoteness of the Mars loca- 
tion dictates that all SEI subsystems be engineered so 


that they are immune to multiple failures or that mis- 
sion objectives can be safely carried out in the face of 
significant loss or degradation of component functions. 
This requirement is even more critical than previous 
missions such as Mercury, Apollo, Skylab, or the Shut- 
tle flights since SEI missions will be inherently longer 
in duration. 

The fault tolerance technologies study dealt with 
both system and component level technologies required 
to achieve the above goals. Its objective was to un- 
derstand the potential failure modes of TNIM equip- 
ment, particularly communications satellites and unat- 
tended switching equipment, and to describe technolo- 
gies which must be developed to improve potential fault 
tolerance of TNIM systems. 

As part of this study we conducted a literature search 
of existing methods for fault tolerant systems engineer- 
ing and interviewed Loral satellite operations personnel 
familiar with communications satellite failure modes. 
Within the resource limits of the study we attempted 
to collect information about existing communications 
satellite systems and their failure modes, and the ap- 
proaches that have been used by both deep space mis- 
sions and DoD survivable satellite programs where au- 
tonomous fault tolerance are critical mission require- 
ments. We also studied a number of NASA, DoD, and 
commercial communications programs with significant 
reliability and fault tolerance requirements. 

A fifteen-step process based on existing NASA and 
DoD reliability engineering methods was identified. 
The initial steps of this process were followed to iden- 
tify potential sources of failure at the system level and 
at the subsystem level in the most critical and least ser- 
viceable TNIM component, the Mars Relay Satellites 
(MRS’s). 

The potential failure modes of the MRS’s were deter- 
mined and characterized by subsystem and time in the 
launch, deployment, and maintenance life cycles of the 
satellite. This analysis was derived from existing op- 
erational knowledge of earth-synchronous communica- 
tions satellites. 

For each potential subsystem failure, key technolo- 
gies which might reduce system failure probability were 
identified and described in tabular form (Table 5-3). 
Four key areas are identified as critical and analyzed 
in detail: (1) attitude control, (2) communications, (3) 
power, and (4) data recorders. 

The following technology areas are identified as key 
areas for the development of fault tolerant technologies 
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in support of TNIM: 

• Intelligent control systems, particularly for space- 
craft attitude and power. 

• Reliable, high-power (> 50 W) Ka-band rf power 
amplifiers. 

• Space-qualified data recording systems with re- 
duced number of mechanical parts. 

• Fiber optics and micro-mechanical com ponents for 
inertial reference units. 

• Reliable Earth, Mars, sun, and/or star sensors with 
reduced mechanics. 

Moreover, it is recommended that a comprehensive 
set of fault tolerance goals should be established for 
TNIM. At the earliest possible point, a system-level 
approach for TNIM fault tolerance should be estab- 
lished to guide subsystem design and technology devel- 
opment effort. We believe that the issue of fault toler- 
ance and reliability for TNIM systems requires substan- 
tially more study than was feasible within the resources 
of this study. 

2.3 Technology Development Plan 

In this study we present a plan for developing the 
necessary technologies for unattended telecommunica- 
tions network management. We recommend the de- 
velopment of a testbed for (1) using new technologies 
and innovative use of existing technologies to deter- 
mine optimum performance approaches for implement- 
ing Mars unattended communications networks and for 
(2) demonstrating the feasibility of applying these tech- 
nologies to the Mars mission requirements. 

We divide the development of this testbed into the 
following stages: 

i. Initial simulation and modeling 

ii. Earth-based networking 

iii. Lunar operations 

iv. Preliminary Mars-based operations 


23.1 Testbed for Managing Network Re- 
sources 

We proposed that NASA fund an effort to create an inte- 
grated testbed which would initially utilize Simple Net- 
work Management Protocol (SNMP) concepts to model 
agent and network components of a series of TNIM ar- 
chitectures. Initial modeling would take place within 
a single workstation with software processes designed 
to simulate TNIM nodes and links. Such a simula- 
tion would be highly parametrized and include specific 
software elements to model communications delays and 
outages based on orbit geometry and known models 
of deep space communications interference. Specific 
faults and communications loads could be introduced 
into the model and the performance of the model under 
various conditions could be monitored and analyzed. 

The initial strategy would be to purchase an off-the- 
shelf network management package based on SNMP 
and insert software elements which would support de- 
tailed modeling of TNIM nodes, links, and management 
functions. 

23.2 Objectives of the Testbed 

The proposed test-bed would have the following objec- 
tives: 

• Identify candidate TNIM distributed network man- 
agement architectures and provide preliminary es- 
timates of network performance under anticipated 
communications scenarios. 

• Test the performance of routing protocols and their 
ability to provide connectivity and dynamic re- 
sponse to network loading. 

• Identify and test strategies for routing network traf- 
fic under both routine link interruptions and equip- 
ment failure. 

• Support an iterative process of discovery of new 
TNIM network management requirements through 
experiment, observation, and analysis. 

• The proposed test-bed would serve to support the 
investigation of the effectiveness of tools for DAI 
in unattended communications network applica- 
tions. 
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2.4 Conclusions & Recommendations 

The primary conclusion of this study is that there is a 
great deal of technology available for unattended net- 
work operation that has not yet been applied to manned 
space networks. Current communications and naviga- 
tion networks used to support manned space missions 
are extremely manpower intensive, far more so than is 
necessary with current technology. Much automation 
of these networks must be done before they can be used 
to support manned operations over the long distances to 
Mars. 

Implementation of the following recommendations 
could reduce the magnitude of technology gap between 
what is presently available commercially and what is 
used for NASA space communications. Implemen- 
tation would better prepare NASA for future manned 
space exploration initiatives. 

• Develop a comprehensive set of fault tolerance and 
fault prevention objectives for future NASA com- 
munication networks. 

• Study, develop, test, and implement TNIM fault 
tolerance technologies as they apply to plans for 
future NASA space-based communications sys- 
tems. 

• Develop a comprehensive set of network manage- 
ment objectives and standards for future NASA 
communication networks. 

• Study, develop, test, and implement new net- 
work management technologies as they apply to 
plans for future NASA space-based communica- 
tions systems. 
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Chapter 3 

Baseline Mars TNIM Network Architecture 


Several Mars Telecommunications, Navigation, and 
Information Management (TNIM) architectural issues 
had to be resolved prior to defining the TNIM func- 
tions to be automated. Contained in this chapter is the 
background information compiled by the study group 
while forming a baseline TNIM architecture to work 
with. While the information contained in this chapter 
was not specifically requested, it was thought that inclu- 
sion of this background information might aid the reader 
as an additional reference. Individuals already familiar 
with the Mars TNIM architecture proposed in NASA’s 
90 day study [l] 1 may wish to proceed to Chapter 4. 

This chapter is organized as follows: 

3.1 90 Day Study Architecture 

3.2 Proposed Baseline Enhancements to the Net- 
work 

3.3 Resulting Line-of-Sight Visibility and Re- 
dundancy Between Enhanced Nodes 

3.4 Possible Data Transmission Methods for 
Mars Network 

3.5 References 

3.1 90 Day Study Architecture 

The 90 Day Study provided several baseline approaches 
for Lunar/Mars exploration. These were referred to as 
reference approaches 1 through 5. These baselines for 
mission requirements were modified and focused to re- 
spond to updated requirements for two mission options, 
1 and 5. A preliminary Telecommunications, Naviga- 
tion, and Information Management (TNIM) architec- 
ture was developed to meet the combined requirements 
of options 1 and 5. This preliminary network architec- 
ture depicted in Figure 3-1 would be in place to support 

1 References are given at the end of the Chapter. 
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TNIM functions for unmanned robotic reconnaissance 
and manned operations on the Moon and Mars. 

3.1.1 Preliminary TNIM Architecture 

A single Lunar Relay Satellite (LRS) in halo orbit about 
the lunar L2 libration point would enhance communica- 
tions and navigation coverage at the Moon by provid- 
ing Lunar far side visibility that could be used to sup- 
port communications with astronomical observatories 
and rovers. 

Initially, Mars Relay Satellites (MRS) located in are- 
osynchronous orbits about Mars would support un- 
manned surveying missions such as a Mars site recon- 
naissance orbiter and surface rovers. The MRS’s will 
provide telecommunications relay support as well as ra- 
diometric navigation support to all users. 

As the 90 Day Study points out, placing the MRS’s in 
areosynchronous orbits would greatly improve connec- 
tivity between points on the Mars surface and the Earth. 

The use of two MRS’s located as seen in Figure 3- 
1 would give 24 hour/day coverage between the Mars 
habitat (located in view of both relay satellites) and the 
Earth. This does not include the outages that occur dur- 
ing solar conjunction when the Sun, Earth, and Mars 
line up radially. It has been assumed that manned mis- 
sions to Mars would be scheduled around periods of so- 
lar conjunction which will occur about every 13 months 
and last for about one week. 

MRS’s also provide improved Mars local commu- 
nications visibility. A non-synchronous relay satellite 
will not provide continuous connectivity between sur- 
face users at Mars. Also, the line of sight direction be- 
tween Mars surface users and the MRS’s remain con- 
stant assuming the areosynchronous orbits are properly 
maintained. This means surface user antenna pointing 
and scheduling systems will be greatly simplified since 
MRS antenna pointing will be fixed relative to the user. 

1 
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Figure 3-1: 90 Day Study Telecommunications, Navigation, and Information Management Architecture 


The MRS proposed would support communications 
along four different paths including: 

1. Mars surface user to Earth 

2. Mars orbiting user to Earth 

3. Mars surface user to surface user long distance 

4. Mars surface user to Mars orbiting user 

These communications paths are represented in Fig- 
ures 3-2 through 3-5. 

3.1.2 Key Requirements and Challenges 

The 90 day study pointed out several key requirements 
and challenges the Mars communications network will 
present. 


1. Network must run unattended at Mars. TNIM 
functions should therefore be performed in the ab- 
sence of man at Mars. 

2. Network must provide fault tolerant system con- 
nectivity for all links. 

3. Network must provide 90% connectivity with the 
manned habitats and 98% of scheduled transmis- 
sions. 

4. Maximum data rates for Mars to Earth transmis- 
sions will reach 10 Mb/s for compressed, high rate 
video transmissions. 

5. Maximum data rates for Mars local transmissions 
will reach 50 Mb/s to support rover telerobotics. 

This study concentrated on the evaluation of items 
1 through 3 above. We examined the network func- 
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Figure 3-4: Mars Surface User to Surface User Path 
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Figure 3-5: Mars Surface User to Orbiting User Path 
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tions that could be automated and how they might be 
made fault tolerant while satisfying the connectivity 
requirements. Technologies needed to provide high 
Earth/Mars data rates (items 4 and 5 above) were as- 
sumed outside the scope of the study. 

Needless to say, the key challenges listed above are 
all interdependent. Satisfying the high Earth/Mars and 
local data rate requirements adds complexity to the 
communications systems by the use of state-of-the-art 
hardware and software. Unfortunately, state-of-the-art 
communications hardware may not be as reliable as 
older, proven, communications equipment. 

While the two-MRS network proposed in the 90 day 
study will provide near 100% connectivity between 
Earth and the manned habitat, it would not provide vis- 
ibility to the far side of Mars nor would it provide suf- 
ficient system level fault tolerance. For this reason and 
others, several enhancements to the network were con- 
sidered that provided better connectivity, improved sur- 
face and orbit visibility; greater redundancy; and hence, 
improved fault tolerance. 

3.2 Proposed Baseline Enhancements 
to the Network 

Prior to examining the network management technolo- 
gies required to support unattended Mars TNIM op- 
erations, several optional network enhancements were 
considered for incorporation into the Mars Network de- 
sign. While the two-MRS design satisfied the minimal 
requirements for connectivity with Earth and the local 
Mars environment, it left few options for an unattended 
fault management system should one of the two MRS’s 
become disabled. 

In addition, this system would not provide commu- 
nications or navigation support to scientific instruments 
or manned outposts above 75 ° latitude or to the far side 
of Mars. 

For these reasons the following network options or 
enhancements were considered for possible inclusion 
into an unattended, fault tolerant network at Mars : 

Incorporation of crosslink capabilities. 

Adding crosslink capability to the MRS’s greatly 
enhances the network’s path redundancy by per- 
mitting an MRS not in view of both the sending 
node and receiving node to relay the signal to an 
MRS that can complete the connection. This fea- 
ture combined with a third MRS (discussed below) 


provides users with redundant paths for commu- 
nications that can be used to either double trans- 
mission rates or provide backup communications 
should an MRS fail. 

Addition of a third MRS. Adding a third Mars Relay 
Satellite (MRS) would provide habitat connectiv- 
ity with the far side of the planet by providing 
coverage of the entire Mars surface excluding the 
poles. In addition, when coupled with crosslink ca- 
pabilities, the manned habitat as well as other users 
may exercise the third MRS as part of a backup 
link either to Earth or to other points on the Mars 
surface. 

Addition of a Mars Polar Relay Satellite (MPRS). 
Adding an MPRS to the network will provide reg- 
ular communications and navigation coverage at 
the north and south poles of Mars. These polar re- 
gions arc not visible from synchronous altitude and 
would therefore have to relay information back to 
the manned habitat or MRS via an MPRS. 

Addition of a Mars Communications Hub (MCH). 
An MCH located at the manned habitat could pro- 
vide additional transmit and receive capabilities 
with Earth through the use of its own communica- 
tions system. The MCH could then operate even if 
both MRS’s in view of the manned habitat should 
fail. This would provide a manual backup capabil- 
ity to the astronauts. 

The MCH could also serve as a concentrator of 
Earth/Mars communications, reducing the com- 
plexity of the MRS design in terms of commu- 
nicating with Earth. The Earth/Mars link would 
essentially be controlled from this single location 
as opposed to distributing the control to each of 
the MRS’s. This use of the communications hub 
would give the astronauts the potential to take 
corrective action in controlling the Mars network 
should its automated operations fail. 

Figure 3-6 shows the resulting Mars network archi- 
tecture. This figure excludes the MPRS which if drawn 
would have its orbit plane perpendicular to the page. 
The shaded regions on Mars correspond to regions that 
have redundant MRS coverage. These regions are each 
about 30° wide at the equator with the exact angular 
width dependent on how close user’s links can approach 
the Mars horizon. 
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Figure 3-6: Enhanced Mars TNIM Network Architecture 


MRS 

#2 




3.3 Resulting Line-of-Site Visibility 
and Redundancy Between 
Enhanced Nodes 

The Line-Of-Site (LOS) visibility between nodes result- 
ing from a Mars communications network that incor- 
porates the previously mentioned enhancements would 
more than meet the connectivity requirements presented 
in the 90 Day Study. This over-designing of the network 
provides a built-in system redundancy that will greatly 
enhance the fault-tolerant qualities of the system while 
at the same time expanding the surface coverage to in- 
clude the far side of the planet and Mar’s poles. 

Redundant links between Mars and Earth can also 
be used to double the data rates from Mars to Earth 
by transmitting through both paths simultaneously. Ar- 
rayed 34 meter DSN antennas will be capable of receiv- 
ing data simultaneously from multiple MRS’s at differ- 
ent frequencies. In the case optical communications 
would be used, an Earth Relay Satellite (ERS) would 
have the capability to receive optical signals transmis- 
sions from different MRS’s simultaneously. 

The resulting Earth/Mars communications visibility 
can be summarized as follows: 

1. Every point on Mars surface below 75° latitude 
and above —75° latitude will be in view of one 
MRS at all times. This MRS may or may not be in 


direct line of site (LOS) of Earth but it will be in 
LOS of both other MRS’s. 

2. Some areas on the Mars surface will see two 
MRS’s at all times. These areas can be divided into 
three regions equally spaced around the equator, 
each about 30° wide in longitude (see Figure 3- 
6). We shall refer to these regions as redundancy 
zones. 

3. MRS locations in Martian sky will be fixed rela- 
tive to surface users if MRS orbits are maintained 
properly. 

4. MRS No. 1 and No. 2 positioning will be such that 
the redundancy zone between them will include the 
manned habitat. Consequently the habitat will be 
in view of both MRS No. 1 and No. 2 at all times. 
One or both of these MRS’s will be in view of the 
Earth at all times except during solar conjunction 
when neither MRS will be in view of the Earth. 

5 . A second independent path for MCH to Earth com- 
munications will be possible at all times for users 
in the redundancy zones. If both MRS’s are in 
view of Earth then a redundant communications 
path can be utilized to communicate through both 
MRS’s simultaneously to Earth, doubling the ef- 
fective Earth/Mars data rates. The second path 
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could also be used as a hot backup should the pri- 3.4.1. 1 Bent Pipe Approach 

mary fail. , . ... 

In the bent pipe approach, communications links can 

6. If one of the two MRS ’sin view of the redundancy be implemented using the standard access techniques 

zone has its view of Earth occulted by Mars, it can commonly applied in satellite communication systems. 

use the its crosslink capability to relay data to MRS Considering the volume of traffic, the need for au- 

No. 3 and back to Earth. tonomous operations, and the complexity, the primary 

candidates are as follows: 

3.4 Possible Data Transmission • Single Channel Per Carrier (SCPQ 

Methods for Mars Network • Time Division Multiple Access (TDMA) 


• Frequency Division Multiple Access (FDMA) 


The following sections are provided as background 
material regarding the various data transmission ap- 
proaches that might be used for Mars TNIM data trans- 
missions. 

3.4.1 General Approaches 

Mars Relay Satellites (MRS’s) will be used to establish 
the communication links described in the previous para- 
graphs. It is assumed that the final configuration will 
have the capability to do crosslinking between satellites. 

This configuration depicts a constellation of three 
MRS’s equally spaced in synchronous orbit around the 
planet Mars. Each MRS can see a user group. These 
user groups can communicate with each other through 
a crosslink with the adjacent satellite. 

Each MRS will be equipped with three sets of anten- 
nas: 

i. One points towards planet Earth; 

ii. Another points towards Mars and Mars low- 
orbiting spacecraft; and 

iii. A third set looks to the second and third MRS. 

The MRS antenna pointing towards Mars will have a 
3 dB footprint covering about one third of Mars’ sur- 
face. Furthermore, this antenna will support multiple 
beams of smaller beamwidths and will be steerable. The 
crosslink and Earth pointing antennas will be parabolic 
and steerable. 

In the MRS, the communications paths can be es- 
tablished through a bent pipe approach or through an 
on-board switching approach. The bent pipe approach 
will allow signals to be routed to their destination via 
the satellite at RF without processing. The on-board 
switching approach provides demodulation, signal pro- 
cessing, and signal routing via an intelligent switch on- 
board the satellite. 


Figure 3-7 shows a bent pipe model. This model can 
be appropriately applied to any of the above communi- 
cation techniques. The MRS will contain transponders 
and antenna control equipment. In this system, there is 
no demodulation of the communication signal and no 
switching is involved. 

Except for some station keeping and antenna point- 
ing on the Mars-to-Earth and the low orbit Mars com- 
munications links, the communication operations are 
independent of the Mars habitat. Each user node or 
user group would be assigned a transmit and associated 
receive frequency. The user paths can be established 
through separate receive frequency chains or through 
techniques utilizing frequency tuning and scanning. J 

For a user-to-user path for a user group within the 
footprint of the local areosynchronous satellite, the user 4 

information would be modulated and transmitted on an 
assigned carrier frequency to the satellite. This signal 
will then be translated and broadcasted to the Mars sur- 
face to a terminal tuned to the translated frequency. In 
the case where the user group is not within the foot- 
print, communications will be established through the 
crosslink. 

The three access techniques which can employ this 
model are discussed in the following paragraphs. 

Single Channel Per Carrier (SCPC). The assigned 
carriers are translated at the satellite, and simul- 
taneously retransmitted back to the local user’s 
group on Mars, crosslinked to the adjacent satel- 
lites to be transmitted to Mars, and/or transmitted 
to Earth. On Mars, the destination node will have 
a receive chain tuned to the translated frequency. 

For duplexing, the reverse process (from the des- 
tination) happens simultaneously. The communi- 
cation to and from Earth will utilize a designated 
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frequency set The data transmitted can be either 
digital or analog. Careful frequency planning is re- 
quired in this mode of operation. Each user must 
be assigned a separate carrier frequency, and the 
Mars-to-Eaith communications band set aside sep- 
arately. 

Frequency Division Multiple Access 

(FDMA) employs multiple channels on subcarri- 
ers which in turn are modulated onto a single RF 
carrier. When this scheme is used in the bent pipe 
configuration, all the users on the network receive 
the same RF signal from the sending station. In 
order to communicate with the sending station, the 
receiving station must acquire the carrier, down- 
convert it, and demodulate the subcarrier signal to 
obtain the data. In this scheme a set of subcarrier 
frequencies are designated for each communica- 
tion channel. On most satellite communications 
paths on Earth, FDMA is used for relatively nar- 
row band communications and for medium to high 
data rate services. 

Time Division Multiple Access (TDMA). Only one 
transmit and the companion receive frequency are 
used. This type of communication system requires 
a reference site, a time burst plan for the user on 
the network, and a closely grouped user set The 
key to successful operation is synchronization of 
the time slot available to the user. Large time allo- 
cation translates to more data transmitted over the 
total transponder bandwidth. 

Because of the delay incurred in the link between 
areosynchronous satellites, it is not practical to 
have a TDMA network outside of the area covered 
by one satellite. 

The delay relative to the round trip reference chan- 
nel delay can not exceed the frame length of the 
TDMA burst plan. This technique will have its 
principal application local user group communica- 
tion cases. Since the TDMA system uses only one 
carrier frequency, the transponders can operate at 
saturation with no intermodulation effects. 

3.4.1.2 On-Board Switching Approach 

In the on-board switching mode, communications links 
can also be implemented using SCPC, TDMA, and/or 
FDMA techniques. Figure 3-8 shows a switching con- 
figuration model for the Mars Relay Satellite (MRS). 


The switching system, however, will include functions 
such as demodulation, modulation, signal processing, 
and switching. Additional functions that can be in- 
cluded are forward error correction, and data driven sig- 
nal routing. On-board equipment health and monitoring 
operations is an added complexity. 

In the On-Board Switching model, the carrier is 
downconverted, demodulated, and then processed at 
baseband. In the baseband processing, the traffic would 
be sorted out by intended destination, rearranged for 
transmission, and then switched to the designated RF 
carrier, and retransmitted to its destination. 

The processing hardware can include transmultiplex- 
ers, forward error correctors, signal regeneration, data 
buffers, and switches, and multiplexers. In this model 
all data information will be in blocked format having 
both header and error control segments for routing and 
control. The processor will be designed to process data 
in various standard formats. 

The three techniques which can employ this model 
are discussed in the following paragraphs. 

Single Channel Per Carrier (SCPC). The single car- 
rier is downconverted, demodulated into data 
streams, the data headers read, the errors corrected, 
and the data routed through a matrix switch to the 
proper transmission path. At this point the data is 
multiplexed with other data having the same desti- 
nation, modulated onto a carrier and routed to the 
proper destination. 

If the destination is a user node on Mars, that node 
will receive the assigned carrier, demodulate the 
signal, and receive the information. This process 
will work in either the simplex or duplex mode. If 
the destination is on the planet Earth, the data may 
be multiplexed with other data information before 
being transmitter to its destination. This model is 
appropriate for a high data rate users. 

Frequency Division Multiple Access (FDMA). In the 
on-board-switching satellite, signals in the fre- 
quency division multiple access mode would be 
demodulated and processed at baseband. In the 
FDMA mode, the ground based user data will be 
modulated onto the subcarrier or subcarriers des- 
ignated for the receiving parties. The subcarrier 
in turn will be modulated onto the RF carrier and 
transmitted to the satellite. 

Each set of transmit and receive subcarriers will be 
designated for a specific communications path. In 
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the satellite, the FDMA carrier will be downcon- 
verted, demodulated by the receiver, processed by 
the transmultiplexer, and routed by the data driven 
switch. In the on-board switching and process- 
ing capability, data from all sources can be multi- 
plexed onto a single carrier and transmitted to the 
designation. 

Time Division Multiple Access (TDMA). Only one 
transmit and the companion receive frequency are 
used. This type of communication system requires 
a reference site, a time burst plan for the user on the 
network, and a closely grouped user set. Because 
of the delay incurred in the link between MRS’s, 
it is not practical to have a TDMA network out- 
side of the area covered by one satellite. In the 
On-board switching model, the TDMA will utilize 
the switch to direct data bursts to spots on Mars 
surface via the multiple beams of the antenna. 

3.4.2 Data Routing Protocols 

In the transmission methods discussed, the efficient 
means of establishing connectivity and communication 
with the receiver is by the application of layered archi- 
tecture. There are two basic levels of protocols that are 
appropriate for the transmission methods described. 

• One is the transmission and data routing protocols 
within the satellite processor and transmitter. 

• The other is the local area network protocols at the 
user group nodes. 

Transmission and data routing protocols are especially 
applicable in the on-board switching model. In the 
satellite processor, standard frame and other blocked 
data formats will be used. The on-board processor will 
read blocked data headers in accordance with the for- 
mats established for the applicable protocol, extract the 
destination and characterization information, and then 
route the data to the proper transmission media. 

In the user group environment, standard ISO layered 
protocols can be used to achieve the proper network 
connectivity between users on Mars. Because there is a 
40 minute round trip delay incurred in the Mars-to-Earth 
transmission, special protocol formats may have to be 
established for this transmission path to take into ac- 
count this time delay, particularly when transmission re- 
quirements call for error-free transmission. Local Area 


Network (LAN) protocols reflecting the ISO architec- 
ture are candidates for the Mars surface communica- 
tions systems. All the standard network configurations 
can be accommodated. This includes the use of packet 
switching networks and X.25 protocol and the use of 
ISO 8473 packet network layer protocol as well as the 
use of upper application layers on virtual paths. 

3.4.3 Comparison of the Technologies and 
Suitability for SEI 

The configurations presented in the approaches de- 
scribed above allow communications between users, 
between users and the Mars habitat, or between a user 
and Earth. The particular method of communications 
depends upon the complexity of the user terminal as 
well as the transmission model selected. Tables 3-1 and 
3-2 provide some of the advantages and disadvantages 
of the various approaches. 
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Table 3-1: Bent Pipe Communications System - Advantages and Disadvantages 


Access 

Technique 

Advantages 

Disadvantages 

SCPC 

• Utilization of proven design 
technology. 

• A simple architecture accommodates 
high data rates. 

• Suitable for growth; expandable to 
support full Mars comm, system. 

• Capable of supporting digital and 
analog data formats. 

• Preassigned operation is inefficient 
for a small number of user groups. 

• Central control of frequency pool if 
demand assignment plan is adopted. 

• Inefficient use of power amplification 
since backoff is required to reduce 
IM noise (up to 6 dB). 

FDMA 

• Preassigned channels can be sized 
according to traffic conditions. 

• Technology for transmitting several 
4 MHz TV channels is available. 

• High power amplifier backoff losses 
(up to 6 dB). 

• Frequency planning is complex. 

TDMA 

• Utilizes full capacity of 
transponder per user group. 

• Preassigned channels can be sized 
according to traffic conditions. 

• Preassigned channels can be sized 
according to traffic conditions. 

• Requires a complex control center 
on Mars. 

• Communication coverage limited 
to footprint of one satellite. 

• Not applicable for Mars-to-Earth 
communications. 

• Requires a very accurate burst time 
plan control. 


Table 3-2: On-Board Switching Communications System - Advantages and Disadvantages 


Access 

Technique 

Advantages 

Disadvantages 

SCPC 

(On-board 

switching) 

(On-Board 

processor) 

• Efficient use of satellite spectrum. 

• Flexible, can provide growth to 
support Mars communication system. 

• Can support AM and FM transmission. 

• Requires much more source power than 
bent pipe communications system. 

• Complex switch state sequence 
for connectivity matrix. 

• Requires transmission and network 
protocol implementation. 

• Requires an on-board state switch. 

• Uplink and downlink modulation can 
be adjusted to optimize transmission 
plans. 

• Complex on-board switching and processing 
requires use of advanced solid state 
technology. 

FDMA 

(on-Board 

switching) 

• Supports high density of user 
support groups. 

• Supports single hop TV transmission 
with present technology. 

• Requires a call request scheme and 
central switch control. 

TDMA 

• All advantages of bent pipe; plus 

• Beam switching flexibility provides 
efficient use of spectrum while 
increasing network connectivity. 

• All disadvantages of bent pipe; plus 

• Requires use of complex methods for 
synchronizing traffic burst and on-board 
switch state time plans. 
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Chapter 4 

Overview of TNIM Functions to be Automated 


This chapter presents the Subtask 1 work concern- 
ing “Identification of Functions for Unattended Opera- 
tions” for a Mars Telecommunications, Navigation, and 
Information Management (TNIM) system. Mars TNIM 
will support Mars-based telecommunications and navi- 
gation functions for the proposed Space Exploration Ini- 
tiative (SEI). 

This chapter is organized as follows: 

4.1 Scope and Methodology for Function Identi- 
fication 

4.2 Description of Mars TNIM Management and 
Control Functions 

4.3 Routing and Transmission Protocols 

4.1 Scope and Methodology for 
Function Identification 

This chapter addresses those TNIM network functions 
required to support communications and navigation in 
the Mars environment and expands on those network 
functions common to the entire network. 

An expansion of TNIM functions specific to the users 
and network nodes is assumed outside the scope and re- 
sources of this study. Other SEI study groups that are 
focusing on specific network elements and architectures 
will need to better define the specific TNIM functional 
needs of their design. 

This study assumes the Mars TNIM architecture is 
not fixed at the present time. It only assumes the net- 
work may be composed of some combination of archi- 
tectural elements listed in Chapter 3. 

Network Management functions covered in this 
chapter are common to all network nodes and are ap- 
plicable to most all network architectures. These man- 
agement or control functions, as they are sometimes re- 
ferred to, are used to control the TNIM network across 


which telecommunications and navigation information 
will flow. 

The TNIM communications system can be modeled 
as a collection of sites, each of which represents a dis- 
tinct distributed location where some type of commu- 
nications equipment resides. These sites are connected 
as presented in the TNIM architectures described in the 
previous chapters. The goal of network management 
or network system control is to maintain user-to-user 
service under changing traffic conditions, user require- 
ments, and system interruptions. In order to achieve this 
goal, the best use of the resources available to detect and 
diagnose faults or service degradation must be applied 
to maintain service to the user. 

The management of current distributed communica- 
tion networks are performed by technicians located at 
geographically distributed locations. When a problem 
develops, each of the technicians attempts to ascertain 
what he/she believes to be the most appropriate step (or 
action) to take to resolve the problem. In most cases, the 
control action taken is determined through coordination 
among a group of technicians located at different sites. 
This scenario is very common in managing a telecom- 
munication system where the complex task is performed 
by a group of people, each of whom has a limited view 
of the problem. 

Given the distributed nature of TNIM, it is fair to as- 
sume that the control of the network functions must be 
distributed. With this assumption and a realistic expec- 
tation of a dynamic behavior of the network, a central- 
ized point of control cannot be expected to have com- 
plete and accurate knowledge concerning the function- 
ing of the overall network. 

There are three major aspects of network manage- 
ment in a distributed environment: 

i. Distributed situation assessment 
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ii. Distributed problem diagnosis 
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iii. Distributed planning 

Distributed situation assessment is needed in monitor- 
ing system operating status. As stated earlier, in a dis- 
tributed network, no single node has complete and the 
most current information on the status of other nodes. 
Therefore, it is important to provide a mechanism for 
detecting a problem and identifying the impact of the 
problem on other nodes of the network, since a local 
problem may have a significant impact on other nodes 
of the network. A need for cooperation and communi- 
cation between the nodes exists in order to assess the 
overall operating status of the network. 

In order to identify the source of a service outage 
or service degradation, a distributed diagnosis system 
is required. The source of the faults may come from 
equipment malfunctions or disturbances from exter- 
nal sources. It is common that a failure occurring in 
one portion of the network often causes problems else- 
where. Without cooperation among control nodes in a 
network, no single node is aware of events outside its 
domain. Correct determination of the source of a prob- 
lem often requires corroborating evidence from other 
sources. 

Distributed planning should be considered when 
ways for restoring interrupted services to users must be 
determined. The major objective is to restore service to 
the most critical users, which may involve the reallo- 
cation of under-utilized resources or preemption of less 
critical users. Since resource utilization and user prior- 
ity change over time, and these changes affect the status 
of the overall network, a sequence of control actions is 
needed to restore service requires appropriate coordina- 
tion among the sites that control the network. 

Other categories of network management functions 
include: 

• Configuration management, 

• Performance management, and 

• Accounting management. 

Table 4-1 contains a list of network management func- 
tions. The functional breakdowns are those defined by 
the International Standards Organization (ISO). 

It is the automation of these network management 
functions that forms the basis of this study. In Chapter 5 
these Network Management functions are examined to 
determine what technologies are needed to support the 
automation of unattended Mars TNIM operations. 


4.2 Description of Mars TNIM Man- 
agement and Control Functions 

As mentioned in the previous section. Integrated Net- 
work Management (INM) functions fall into one of six 
main categories: 

1. Fault Management 

2. Configuration Management 

3. Accounting Management 

4. Performance Management 

5. Resource Management 

6. Security Management 

The following paragraphs describe these functions 
and list subfunctions where applicable. Possible im- 
plementations of these functions are described in Chap- 
ter 5. Three common architectures are the centralized, 
the distributed, and the distributed peer INM as shown 
in Chapter 5 — Figures 5-2, 5-3, and 5-4. 

4.2.1 Fault Management Functions 

A comprehensive set of functions for fault handling 
must be provided. These functions will include detect- 
ing, diagnosing, and recovering from network faults. 
Since much of the TNIM equipment will be difficult to 
access, fault handling functions must emphasize early 
prediction and prevention of faults or automated switch 
over to redundant systems. 

Included are functions for the following: 

• Prediction of faults through error log monitoring 
and statistical analysis, 

• Prevention through active measures such as equip- 
ment shutdown, 

• Detection of faults when they occur through mon- 
itoring systems, 

• Isolation of faults to specific systems through error 
log and link analysis, 

• Correction of faults, and 

• Use of contingency procedures when faults occur. 
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Table 4-1: Mars TNIM Network Functions and Network Interfaces 


Users 

Network Functions 

Network Nodes 

Earth networks 
SEI Control Centers 

SEI spacecraft, MTV, MEV, Space Station 
Data processing centers 
Mars surface science 
Mars orbiting users 

Fault management 
Configuration management 
Performance management 
Resource management 
Security management 
Accounting management 

Mars relay satellites 
Mars surface terminal 
Mars communications hub 
Deep Space Network 
TDRSS, ATDRS 


4.2.2 Configuration Management Functions 

Configuration management functions include defin- 
ing, changing, monitoring, and controlling network re- 
sources and data. 

4.2.3 Accounting Management Functions 

In order to support the TNIM planning and schedul- 
ing functions, the status and availability of network re- 
sources must be monitored and maintained. Resource 
availabilities may vary due to planned outages such as 
the Mars Telecommunications relay satellites being out 
of Earth view, or due to unplanned events such as equip- 
ment failures. 

4.2.4 Performance Management Functions 

Performance Management functions include monitor- 
ing both the current and long-term performance of the 
network. Parameters monitored include: 

• Effective Link Data Rates 

• Link Data Quality 

• Time taken for link acquisition 

• Link down times 

4.2.5 Resource Management Functions 

Once high-level priorities and policies are defined and 
the major events planned, TNIM network resources 
need to be assigned to support them. This will result 
in a high-level event schedule and general resource al- 
location profile. For example, a general resource allo- 
cation might allow Mars inhabitants 4 hours per week 
on a video link to Earth. 

Resources managed in the Mars TNIM network in- 
clude: 


• Mars Relay Satellites (MRS) and their resources 
including antennas, transponders, and RF and Dig- 
ital equipment. 

• Mars Surface Terminals (MST) and their re- 
sources. 

• Mars Communications Hubs (MCH), if present. 

• Deep Space Network (DSN) resources. 

• Earth Relay Satellites, if present. 

4.2.6 Security Management Functions 

Security Management Functions ensure authorized ac- 
cess to the network resources and user information. 

4.3 Routing and TVansmission Proto- 
cols 

Routing and transmission protocols in an unattended 
telecommunication environment should ideally satisfy 
the following requirements: 

• The system should be demand-driven and self- 
configuring, adjusting automatically to changes in 
topology caused by equipment failure or degrada- 
tion, moving targets, or changes in bandwidth re- 
quirements. 

• The ground systems should require a minimal 
knowledge of the underlying multiplexing and 
polling schemes on board the spacecraft. This will 
make it possible to change the multiplexing/poling 
algorithm on the spacecraft independent of making 
the same demands of the ground systems. 

• The system should be capable of handling both 
packet-switched and circuit-switched data. 
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• The system should be data-driven, requiring little 
or no advance resource allocation for small extra 
resource requirements. 

• The system should be capable of buffering data 
to accommodate high-rate data collection, retrans- 
mission protocols, link outages, and the differ- 
ences in transmission rates. 

• The system should provide different grades of ser- 
vice according to the criticality of the information 
being sent. It should provide guaranteed receipt for 
command and control application, and probability 
of receipt for other classes of application. 

The netwoik management approach should provide the 
overlying set of services which manage these protocols 
and ensure that the network accomplishes these func- 
tions with minimum user interaction. This objective 
becomes most important in an unattended environment 
such as TNIM. 



Chapter 5 


Assessment of Network Management 
Technologies 


This chapter is organized as follows: 

5.1 Overview 

5.2 Network Management Technologies 

5.3 Fault Tolerance Technologies 

5.4 References 

5.1 Overview 

This chapter addresses Subtask 2 of the Statement of 
Work which requests for an assessment of alternative, 
innovative, system concepts which might affect tech- 
nology levels needed for unattended Mars network op- 
erations. After reviewing current space-based network 
approaches (for example DSN, TDRS, NASCOM), it 
has become evident that few if any innovative system 
concepts for unattended space network operations are 
in use or are being developed. 

An exception to this is spacecraft fault tolerance tech- 
nologies where much work has been done. The fault 
tolerance technologies were thought to be applicable to 
unattended network operations since these technologies 
are inherent to the design of an unattended network. An 
assessment of fault tolerance technologies is given in 
Section 5.3. 

As mentioned above, concepts and technologies for 
space-based unattended network operations has re- 
ceived little or no attention within NASA. Current space 
communication networks in use by NASA are very 
manpower intensive in virtually all functional areas (for 
example scheduling, acquisition, and problem resolu- 
tion). 

One bright spot however is in the area of commer- 
cialized network control. This area includes telephone 


and information systems that are very competitive and 
hence cost conscious. This area has seen much automa- 
tion of the processes and functions involved, some of 
which could be applied to a space-based network. Even 
these networks, however, do not run fully unattended. 
Here man is still in the loop to handle faults. 

It is also possible that the Department of Defense 
(DoD) has developed autonomous network concepts 
and technologies, but an extensive review of DoD space 
programs was not conducted in this study due to time 
and manpower restrictions. 

Obviously there is a strong need for defining and de- 
veloping the technologies required for an unattended 
space-based network. Section 5.2 contains the assess- 
ments of technologies needed to take man out of the 
loop in the context of a commercial terrestrial network. 
This section focuses an Artificial Intelligence (AI) tech- 
nologies that would replace men in the decision loop 
with software. 

After we leam how to run an unattended network 
on Earth we can apply these technologies to space net- 
works where new difficulties arise due the long dis- 
tances and high data rates involved. 

5.2 Network Management Technolo- 
gies 

5.2.1 General Discussion 

An autonomous telecommunication network manage- 
ment system for the SEI environment must have the fol- 
lowing capabilities: 

• Manage a combination of interfacility (MCH to the 
MRS, MCH to Earth), and intrafacility (MCH) net- 
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works. 

• Manage a wide range of network resources from 
low level devices (e. g. repeaters, modems, etc.), 
to intermediate systems (e.g., bridges, routers, 
gateways, etc.) to end systems (terminals, etc.). 

• Provide a set of basic management services such 
as those defined by the ISO, (including fault man- 
agement, configuration management, accounting 
management, performance management, resource 
management, and security management), and to 
manage heterogeneous communications equip- 
ment. 

5.2.2 Networking and Communication System 
Requirements 

The three major classes of requirements of an informa- 
tion management system which are levied on a commu- 
nications network are as follows: 

1. Network interconnection, 

2. Network bandwidth, and 

3. Network management functions. 

Appropriate technologies must be selected to manage 
these three critical areas. 

Network Interconnection describes the hardware and 
software devices required for connecting dif- 
ferent network components in a geographically 
distributed environment. These are generally 
achieved by interconnecting Local Area Networks 
(LAN’s) and Wide Area Networks (WAN’s). In- 
terconnecting heterogeneous networks introduces 
problems due to the incompatibility of the rout- 
ing protocols, transmission delays and degradation 
of the network performance. Network elements 
such as bridges, routers, and gateways can be in- 
troduced to provide the necessary interconnection 
and translation of addresses and protocols. 

Space systems currently under development such 
as the Space Station and EOS are characterized by 
diverse protocol sets including high speed avion- 
ics busses, token-passing fiber optics busses, high 
speed parallel interconnections, circuit-switched 
networks with complex signalling protocols, and 
broadband networks. 


Network Bandwidth describes the ability of a system 
to store and transmit large volumes of information 
in a fast and efficient way. In a distributed client- 
server system, the network is responsible for main- 
taining connections between the network compo- 
nents. More complex applications are being pro- 
cessed at the workstations in the network instead of 
the mainframes. These computing nodes perform 
complex algorithm computations, file transfer and 
query processing. 

Modem computer networks are moving toward 
distributed server-client architectures where spe- 
cialized data storage, processing, and input-output 
functions are transparently distributed across the 
network. This results in significant requirements 
for bandwidth, both across links and at the network 
interfaces and interconnections. 

Network Management Functions are the functions 
provided between the network subsystems such as 
the following: 

• Fault management (detecting, diagnosing, 
and recovering from network faults) con- 
figuration management (defining, chang- 
ing, monitoring and controlling network re- 
sources and data, integration of data, voice, 
and video information) 

• Performance management (tracking tactical 
and strategic performance of the network in- 
cluding trends analysis) 

• Accounting management (recording usage of 
network resources) 

• Security management (ensuring authorized 
access to the network resources and compo- 
nents) 

• Resource management and user directories 
(supporting directories for managing net- 
work assets and user information) 

These functions become increasingly complex as 
the networks become a more significant part of 
the computing environment and as additional func- 
tions are distributed across the network. 

5.2.3 Overview of Study Results 

Figure 5-1 depicts a suggested roadmap for assessing 

appropriate technologies for implementing unattended 
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Issues: 

• Taking actions at temporal 
and spatial distance 

• Coordination and 
reasoning with 
indeterminate knowledge 

• Synchronization 
mechanisms demand 
complete global knowledge 
consistency 

• Conventional approaches 
employ hierarchical 
indexing to satisfy 
consistency at a high 
resource penalty. 



Figure 5-1: Roadmap to Technology Assessment for SEI Unattended Network Management 


network management system to meet the objectives 
of the study. This roadmap identifies the current ap- 
proaches to Integrated Network Management (INM) 
and the architectures for INM. Technological issues 
with conventional approaches to INM implementations 
in distributed networks are described and protocol ap- 
proaches for managing network resources are also pre- 
sented. 

The principal result of this study is a recommen- 
dation that a distributed/multi-agent problem approach 
which makes innovative use of conventional software 
technologies and effective application of emerging soft- 
ware technologies in the field of Artificial Intelligence 
(AI) is necessary to achieve a truly robust and dynamic 
network. The conventional network management ap- 
proaches such as the use of signalling protocols and dy- 
namic routing of communications packets (e. g. ISDN 
or packet networks) have been found to significantly en- 
hance our capability to provide management functions 
for TNIM, but these approaches can break down under 
the following conditions: 

• Large time delays 

• Poor signal-to-noise ratios 


• Limited alternative communications paths 

• Lack of human capability to attend to unantici- 
pated problems 

These disadvantages can be eliminated through the use 
of distributed AI which uses embeddable conventional 
software technologies as agents along with embeddable 
expert system software systems. 

A key benefit of Distributed AI (DAI) in a distributed 
network is survivability, the capability to substitute or 
transfer functionality if a node is lost Other benefits 
of DAI are presented as well as existing tools and tools 
being developed in R&D laboratories. One major issue 
with DAI is distributed coordination. The frameworks 
for distributed coordination and solution approaches are 
presented. 

A tradeoff analysis of embeddable conventional and 
AI technologies for DAI is presented in Table 5-1 . Con- 
ventional discrete event modeling and simulation tech- 
nology, operations research optimization algorithms, 
automated data acquisition techniques, and conven- 
tional database management system techniques should 
be investigated for their applicability to performance 
management, resource management, accounting man- 
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agement, and data storage and retrieval management of 
the TNIM system respectively. The benefits and draw- 
backs of embeddable AI technologies for all the net- 
work management functions are also depicted in Ta- 
ble 5-1. 

5.2.4 Objectives 

Given a heterogeneous mix of computer resources, net- 
worked together (via LAN’s, satellite links, or other 
method) to be applied cooperatively toward the solu- 
tion of a problem, technologies must be developed to 
address the following key issues: 

• Synchronizing the processing times of the dis- 
tributed processing elements, 

• Determining the state of a resource in a distributed 
system, 

• Ensuring the stability (robustness) of a distributed 
system, 

• Verifying the correctness of the execution of the 
distributed system. 

These issues can be addressed and resolved by apply- 
ing appropriate management technologies to the follow- 
ing three major components of distributed processing: 

i. Computation at the nodes, 

ii. Communication between the nodes, 

iii. Synchronization of the processes. 

5.2.5 Approach 

The following are the three common approaches em- 
ployed in the telecommunications industry to inte- 
grate heterogeneous communications equipment net- 
worked together, (i. e. Integrated Network Management 
(INM)): 

Integrator Approach: separate management tools are 
integrated by adding a new “supersystem” which 
has the capabilities to integrate the others. 

Translator Approach: one or more of the manage- 
ment systems translates its management informa- 
tion and functions to those of other proprietary sys- 
tems. The function of the proprietary system is 
to maintain its own network while allowing other 
management systems to attach to it. 


Open System is a standards-based approach in which 
all networks’ elements and management systems 
employ a common language and a common set of 
functions. This approach’s main emphasis is the 
development of an interoperable network manage- 
ment system. 

Four major efforts to achieve an architecture for inte- 
grated network management are currently being under- 
taken: 

i. AT&T (Unified Network Management Architec- 
ture, UNMA), 

ii. DEC (Enterprise Management Architecture, 
EMA), 

iii. IBM (Netview), and 

iv. OSI/Network Management (OSI/NM) Forum. 

AT&T uses the integrator approach, DEC and IBM 
employ a translator-like approach, while the OSI/NM 
Forum (a group of more than 60 international computer 
and telecommunications equipment vendors and service 
providers) promotes the use of existing standards and 
emerging standards to develop a set of specifications 
which will satisfy an interoperability standard. 

5.2.5.1 Possible Architectures for INM Implemen- 
tations 

Figures 5-2, 5-3, and 5-4 depict three possible Inte- 
grated Network Management (INM) implementations: 

i. Centralized INM with distributed communication 
nodes, 

ii. Distributed-hierarchical INM, and 

iii. Distributed peer INM. 

Each of these implementations must incorporate auto- 
mated tools for network management functions identi- 
fied above. 

5.2.5.2 Technological Issues with Conventional 
Approaches to INM Implementations in 
Distributed Networks 

The major problem in implementing a distributed com- 
munication network system is how to take action at a 
temporal and spatial distance. For example, when a 
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In this configuration, each node is provided with hardware/software to make it 
operate as an intelligent process with the network manager monitoring the 
communication between the nodes and synchronization of the processes 


Figure 5-2: Centralized Integrated Network Management with Distributed Intelligent Processes 



Figure 5-3: Distributed Hierarchical Integrated Network Management 
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Table 5-1: Technology Tradeoff Analysis - Network Control and System Management Technologies 


Management 

Functions 

Technologies 

Benefits 

Drawbacks 

Fault 

Management 

Artificial neural systems 

Adaptive 

No explanation capability 

Rule-based expert systems 

Easy to develop 
Proven technology 

Not adaptive knowledge base updates 
Maintenance cost of knowledge base 

Hybrid model-based 
algorithmic systems. 

Robust knowledge base 

Requires wide knowledge of system 
Not adaptive 

Configuration 

Management 

Hybrid frame/rule-based 
expert system. 

Proven technology 
Modular design 

Not adaptive 

May require too many rules 

Case-based reasoning and 
machine learning. 

Adaptive 

Large storage requirement 

Backward-chaining rule- 
based systems. 

Proven technology 

Regular knowledge base updates not 
acceptable. 

Distributed/concurrent 
problem solving (AI). 

Modular, extensible, 
fast, adaptable. 

Time synchronization and intemodal 
communications issues. 

Performance 

Management 

Conventional discrete 
event simulation. 

Proven algorithms 

Requires computationally intensive 
environment 

Knowledge-based simulation. 

Extensibility 
Modular design 

Interface with conventional data base 
management system (DBMS). 

Resource 

Management 

Operations research 
optimization algorithms, 
(e.g. linear programming) 

Excellent for problems 
with a small set 
of constraints. 

Expensive solutions for problems 
with a large set of constraints. 
May produce infeasible solution. 

Hybrid AI heuristics and 
conventional resource 
allocation techniques. 

Feasible solution for 
small and large 
problems. 

Solution is generally suboptimal. 

Accounting 

Management 

Automated data acquisition 
tools. 



Data 

Storage 

, Object-orientated DBMS. 

Extensible. 

Intelligent query opti- 
mization techniques. 

Immaturity of research on persistent 
knowledge bases.. 

Conventional DBMS. 

Proven technology. 

Inefficient query optimization techniques. 
Static data formats. 


subnet 1 
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Integrated 
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subnet 2 


Intelligent Distributed Problem 
Solving concepts must be 
employed for each subnet to 
operate intelligently and for the 
whole system to operate 
autonomously. 
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Integrated 
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Figure 5-4: Distributed Peer Integrated Network Management 
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communications link fails, new routing strategies must 
be coordinated not only at the nodes involved, but also 
at additional nodes involved in the alternative routing 
strategy. The problem solver must reason about and co- 
ordinate the remote effects of local decisions and rea- 
soning with indeterminate knowledge. 

The problems are exacerbated as the scale of the sys- 
tem increases. For example, it may not be possible for 
a node to immediately determine the status of another 
node or activate it in time to ensure that network connec- 
tivity is maintained. Even current world-wide commu- 
nications systems (e. g. NASCOM) have serious prob- 
lems with synchronizing and coordinating communica- 
tions services. These problems can only be expected to 
increase as we attempt to expand the communications to 
include lunar and Mars-based communications nodes. 

To minimize these problems hierarchically arranged 
regional synchronization mechanisms with complete 
global consistency can be used. However, the strate- 
gies to be employed must be carefully designed to cope 
with time lag and synchronization problems. Hierarchi- 
cal indexing and synchronization are required for com- 
plete consistency, but there is a resource penalty to be 
paid. 

Protocol Approaches for Managing Network Re- 
sources. The use of protocols have been widely ac- 
cepted as an effective means for managing local area 
and wide area networks in multi-vendor environments. 
These protocols include: 

• Common Management Information Protocol 
(CMIP), 

• Common Management Protocol over TCP/IP 
(CMOT), and 

• Simple Network Management Protocol (SNMP). 

The most widely accepted and used protocol is the 
SNMP. 

There are two components to the implementation of 
SNMP; an Agent and a Network Management Station. 
The agent is software found on a variety of network el- 
ements (bridges, routers, file servers, etc.). The agent 
collects network statistics for the element on which it 
resides. The agent forwards the information when re- 
quested by a network management station or when an 
event occurs. Other network management tasks include 
the status of various network elements, reviewing er- 
ror situations and dynamically rerouting network traffic 


around network nodes which are heavily loaded. Due to 
its message passing and distributed topology, the SNMP 
has a design framework which lends itself to imple- 
menting a distributed artificial intelligence system. 

The difficulty of implementing large scale concurrent 
networks can be aided by focussing on newer problem 
solving techniques which do not depend upon complete 
data consistency or complete knowledge of the environ- 
ment Technologies being developed in the AI disci- 
pline of Distributed Problem-Solving and Multi-agent 
Systems otherwise known as Distributed AI provide the 
tools necessary for managing unattended telecommuni- 
cation networks. 

5.2 .5.3 Role of Distributed Artificial Intelligence 

Distributed Artificial Intelligence (DAI) is a branch of 
the AI discipline dealing with the cooperative solution 
of problems by a distributed and decentralized group of 
agents. The agents can be simple or complex processing 
elements. 

Distributed anificial intelligence deals with problems 
that develop when a group of loosely coupled problem 
solving agents cooperate to solve a problem. In such a 
problem solving environment, each of the agents has a 
limited amount of knowledge of the problem and can 
only gain this knowledge through communication and 
coordination with other agents. Distributed AI falls into 
two categories: 

• Distributed Problem Solving (DPS) 

• Multi-Agent Systems. 

Figure 5-5 identifies the two categories and the 
frameworks for distributed coordination. By combin- 
ing conventional software problem solving techniques 
with emerging AI technologies, DAI is able to solve 
complex problems in a distributed environment. In a 
telecommunication network, by applying a framework 
similar to the SNMP, the status of each element or node 
on the network can be monitored and any faults can be 
detected with conventional software or hardware tech- 
nologies. Judicious selection of AI and conventional 
methodologies can be employed to manage the faults. 

A modeling and simulation technique can be em- 
ployed to monitor the performance of the processing 
elements within the network. The performance infor- 
mation will help in identifying the constraints within 
the network. The information about the network con- 
straints can then be used for managing the resources 
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Tools for DAI: 


Major Issue: Distributed 
Coordination 

• Coordination between 
agents to share knowledge 
of problem and solution 

• Reasoning about 
coordination processes 


Solution Approaches for 
Distributed Coordination: 

• Explicit Control 

• Explicit Synchronization 
and Communication 

• Functional 

accurate/Cooperative Approach 

• Reasoned Control: requires 
embedded intelligent systems. 


Figure 5-5: Distributed Artificial Intelligence 


within the network. A plan for implementing such a 
system and removing the human element in the loop 
and thereby achieving a minimally attended network in 
TNIM is presented in Chapter 6. 

Distributed Problem Solving (DPS). This approach 
deals with how the work of solving a particular prob- 
lem can be divided among a number of modules that 
cooperate by dividing and sharing knowledge about the 
problem and the solution that develops. 

Multi-Agent Systems are comprised of intelligent 
behavior among a collection of autonomous, intelligent 
agents which can coordinate their knowledge and capa- 
bilities to solve problems. These agents may be work- 
ing towards a single global goal or separate individual 
interacting goals. 

The following are some of the benefits of Distributed 
Artificial Intelligence: 

• Inherent parallelism in the approach speeds up 
computations and problem solving; 

• Reliability and survivability is improved through 
redundancy; 


• Helps to achieve increased modularity and recon- 
figurability; 

• Accommodates open systems (systems with no 
complete representation and with dynamically 
changing boundaries); 

• Adaptability: concurrent systems are inherently 
more adaptable than sequential systems; 

• Multiple perspectives: different problem solvers 
can bring several perspectives to bear on one prob- 
lem. 

5.2.5.3.1 Frameworks for Distributed Coordina- 
tion and Problem Solving. A critical issue with DAI 
is the problem of coordination between the agents in 
DPS and multi-agent systems in sharing knowledge on 
the problem being solved and solution expected; and 
reasoning about the coordination processes among the 
agents. Some of the frameworks which have been de- 
veloped for accomplishing this coordination are as fol- 
lows: 

Blackboard frameworks. A collection of knowledge 
sources relies upon a global scheduler and a 
centrally shared knowledge base or blackboard 
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for communication, consistency and control. 
This framework can also be used for multiple- 
interacting blackboard problem solvers. 

Contracting or Negotiation Frameworks use bid- 
ding, contracting, and information-exchange pro- 
tocols to allocate work or resolve conflicts. 

Multi-agent Planning Frameworks use a single 
agent or a group of agents to form a plan for solv- 
ing a multi-agent problem. Dependencies and con- 
flicts among the actions and knowledge of differ- 
ent agents are identified in advance. Each agent is 
provided with the knowledge about the communi- 
cation and synchronization needs of other agents. 
(Problems with reasoning about the effects of con- 
current actions.) 

Integrative frameworks provide a set of communi- 
cation and consistency mechanism supporting a 
number of complementary paradigms for problem 
solving. 

Open system frameworks provide theoretical and 
practical models of flexible, self-configuring com- 
munication and coordination frameworks. These 
frameworks may include locally reconfigurable 
communities of agents. 

5.2.S.3.2 Issues arising from Distributed Coordi- 
nation. The issue of how to allocate work to a collec- 
tion of agents over time to maximize the values of some 
performance criteria must be resolved. The approaches 
used vary according to the amount of knowledge each 
agent has: 

Explicit control uses explicit centralized constraints 
which are minimally adaptive. 

Explicit synchronization and communication uses 
semi-centralized interaction constraints which are 
adaptive to temporal uncertainties. 

Functional accurate/cooperative approach uses 
semi-centralized opportunistic control with fixed 
interactions which are adaptive to some temporal 
uncertainty. 

Reasoned control (where agents use knowledge of 
selves and others to build and revise coordination 
frameworks). This approach provides predictions 
and adaptive interactions. It is more adaptive to 


semantic, temporal and interactional uncertainty, 
permits minimal sharing and decentralization. 

S.2.5.3.3 Tools for Distributed Artificial Intelli- 
gence. Four kinds of tools are available for DAI ex- 
perimentation and development: 

i. Integrative systems; 

ii. Experimental testbeds; 

iii. Distributed, object-oriented languages; and 

iv. Paradigm-specific shells. 

Integrative systems provide the framework to com- 
bine a variety of paradigm -specific tools and meth- 
ods into a useful system. ABE (from TecKnowl- 
edge Inc.) is a framework for integrating a num- 
ber of heterogeneous, independently developed, 
problem-solving paradigms and software tools. 
AGORA is an opaque, high level operating sys- 
tem developed to integrate heterogeneous hard- 
ware systems under a common operating system. 

Experimental Testbeds. Multi-Agent Computing En- 
vironment (MACE) is a generic testbed for build- 
ing DAI systems of varying levels of granular- 
ity. MACE allows integration of different problem 
solving and communication structures by provid- 
ing the programmers with a collection of tools such 
as knowledge representation and reasoning tools, 
pattern matchers and remote demons. 

Distributed Object-Oriented Programming 

Environments. LISP, C++ and UNIX operating 
systems. 

Paradigm-Specific Shells. 

Concurrent Blackboards (e. g. BBOX from Tec- 
Knowledge Inc.) 

5.2.6 Network Management Functions and 
Technologies 

In general, network management consists of a combina- 
tion of human, software, and hardware elements. The 
human elements consist of network administrators who 
make decisions on network management. The software 
and hardware elements are the automated network man- 
agement tools which provide management capabilities 
for the network. These tools perform the following net- 
work management functions: 
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1. Fault Management: detecting, diagnosing, and re- 
covering from network faults. 

2. Configuration Management: defining, changing, 
monitoring and controlling network resources and 
data. 

3. Accounting Management: recording usage of net- 
work resources. 

4. Performance Management: tracking current and 
long term performance of the network (trend anal- 
ysis). 

5. Resource Management and User Directory: Sup- 
porting directories for managing network assets 
and user information. 

6. Security Management: Ensuring authorized ac- 
cess to the network resources and components. 

These classes of functions correspond to the classes 
of functions named in the ISO Network Management 
framework. An architecture for these management 
functions is described by Figure 5-6. 

The Network Interface is the equivalent of a user in- 
terface in an attended network management system. It 
will serve as a monitor for all the management func- 
tions. It provides the functions ordinarily performed by 
a human user. 

The Database/Knowledge base stores the data and 
knowledge about the network. The data is used when 
conventional software is used for automating some net- 
work functions, and the knowledge base is used when 
expert system tools are used to implement or augment 
certain network management functions. Distribution 
and format of this data base are major issues. 

The Management Protocol Engine is responsible for 
providing the means by which the network manager can 
communicate with the network management functions 
(agents) in individual network components. It also pro- 
vides the mechanism for management data acquisition. 

Protocol Stacks provide the interface to the networks 
being managed. 

5.2.6.1 Fault Management Technologies 

AI technologies such as diagnostic expert systems and 
artificial neural systems can be used for detecting and 
diagnosing and corrected network faults. There are cur- 
rently many excellent off-the-shelf software packages 


which provide adequate functionality to support devel- 
opment of robust AI fault management systems. 

Diagnostic expert systems rely on the capture of 
problem information and specific recommendations by 
human experts. A rule-based expert system uses a com- 
bination of a data base which stores rules in IF. . . THEN 
format and a set of inference algorithms to make deci- 
sions about the characteristics of the problem and ap- 
propriate actions. 

For simple subsystems rulebased systems with pre- 
dictably robust performance are easy to develop, but 
complex systems often require more powerful strategies 
such as model-based systems. Model-based expert sys- 
tems use an internally defined model to reason about the 
system, both to trace causal relationships and explore 
possible corrections strategies. Hybrid approaches of- 
ten prove to be the most workable and effective systems. 

Artificial neural systems provide excellent means 
(learning paradigms and models) for detecting errors 
and classifying errors (diagnosing). Their principal 
benefits are their ability to provide robust response over 
widely varying conditions, the ease of programming 
(training) them to handle new situations, and their ca- 
pability to provide real-time response once trained. 

A combination of the use of artificial neural systems 
and expert systems is ideal for fault management. Re- 
covery from network faults can be represented as a set 
of rules which will operate on the results from the Arti- 
ficial Neural System (ANS) and model-based reasoning 
systems. 

S.2.6.2 Configuration Management Technologies 

Configuration management is a prerequisite for effec- 
tive application of the other elements of network man- 
agement. The database of all network components 
(e. g., hardware, software, circuits or lines) must be 
made available to help in scheduling and tracking of 
changes to the configuration. 

Techniques for configuration management. The ar- 
tificial intelligence approach can be employed. One ap- 
proach using a hybrid rule/frame- based methodology 
can apply a set of rules that specify what a complete or 
legal solution must include when presented with a set of 
initial choices from among a set of options, with impli- 
cations or constraints. It must also apply some conflict 
resolution rules to arrive at a legal solution. Figure 5-7 
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presents a process flow diagram for configuration man- 
agement. 

A machine learning technique can be developed and 
trained with a certain set of examples, and trained to 
recognize patterns thereby learning how to configure 
the network when certain patterns appear in the sys- 
tem. Such an adaptive system is better than a rule-based 
system which will demand that new rules be developed 
when new situations which has not been represented in 
the rule base occur. Also, there may be too many rules 
to write. For an unattended network system, an adaptive 
system is highly recommended. 

Case-Based Reasoning Systems can be applied in 
configuring a communication network system. If a 
knowledge of possible cases exists, then by using a li- 
brary of past cases, the system can reconfigure the net- 
work autonomously. 

The monitoring part of configuration management 
can be performed by a backward-chaining rule-based 
diagnostic expert system which runs repeatedly until 
a particular recommendation is encountered and then 
alarms appropriate embedded system, or a forward 
chaining system that monitors data patterns and informs 
the controlling function when a particular pattern is en- 
countered. 

5. 2.6.3 Accounting Management Technologies 

Data being generated at different nodes of the network 
will need to be buffered and retrieved in an efficient 
manner and in a form understood by a cling agent. It is 
necessary to investigate efficient data acquisition, stor- 
age and retrieval technologies for managing the infor- 
mation. A combination of conventional approached to 
data management and object oriented data base manage- 
ment technologies will be investigated to handle data 
generation at each node and data distribution to request- 
ing agents within the network. 

5.2.6.4 Performance Management Technologies 

This involves measurement and analysis of resource uti- 
lization and network response time. Engineering traffic 
statistics are provided to aid in predictive network per- 
formance. 

Approaches require discrete event simulation tech- 
nologies. Knowledge-based simulation techniques and 
object oriented knowledge representation schemes are 
employed to aid system modularity and extensibility. 


5.2.6.5 Security Management Technologies 

Technologies that guarantee the correct coordination of 
the agents (software) are required for preventing incor- 
rect activation and execution of other agents. These 
technologies are necessary to ensure a secure and re- 
liable system. 

5.2.6.6 Resource Management Technologies 

For small and simple tasks, forward chaining, rule- 
based expert systems or operations research techniques 
Ginear programming) can be used. 

Mid-size or large and complex resource management 
task demand the use of hybrid tools (object oriented and 
rule-based reasoning). 

5.3 Fault Tolerance Technologies 

A key issue in providing unattended operations for 
TNIM is the extent to which the network will be fault 
tolerant. The TNIM system must provide significant 
levels of fault tolerance both at the system subsystem, 
and component levels to achieve its mission. This is 
driven by a number of considerations: 

• Minimal availability of manpower and resources 
at Mars or in Mars orbit to repair malfunctions. 

• Limited capability for remote diagnosis due to the 
difficulties of Mars-Earth communications (time 
delay, pointing, bandwidth). 

• Critical support requirements during maneuvers 
such as aerobraking and launch. 

• Long mission life including both transit from Earth 
to Mars and on-site and on-orbit at Mars. 

• Need for man-rated reliability during the later 
phases of SEI. 

Most importantly, the long journey time between 
Earth and Mars and the remoteness of the Mars loca- 
tion dictates that all SEI subsystems be engineered so 
that they are immune to multiple failures or that mis- 
sion objectives can be safely carried out in the face of 
significant loss or degradation of component functions. 
This requirement is even more critical than previous 
missions such as Mercury, Apollo, Skylab, or the Shut- 
tle flights since SEI missions will be inherently longer 
in duration. 
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TNIM must be engineered so that: 

• Components are unlikely to fail during their 
planned mission life cycle; 

• Unanticipated component failures are detected and 
redundant components are substituted with little or 
no immediate human input; 

• Further subsystem or system-level failure detec- 
tion and correction mechanisms are available to 
deal with more serious large-scale failures; and 

• As a last resort, degraded modes of operation are 
available to reduce the risk of loss of life or critical 
SEI resources while remote diagnosis and correc- 
tion is performed. 

5.3.1 Objectives 

The fault tolerance technologies study deals with both 
system and component level technologies required to 
achieve the above goals. The objective is to under- 
stand the potential failure modes of TNIM equipment, 
particularly communications satellites and unattended 
switching equipment, and describe technologies which 
must be developed to improve potential fault tolerance 
of TNIM systems. A number of these technologies are 
also applicable to other SEI systems. 

5.3.2 Approach 

Several methods of information collection were used in 
this study. Included were a literature search of exist- 
ing methods for fault tolerant systems engineering and 
interviews with Loral satellite operations personnel fa- 
miliar with communications satellite failure modes. 

Within the resource limits of the study we attempted 
to collect information about existing communications 
satellite systems and their failure modes, and ap- 
proaches that have been used both deep space mis- 
sions and DoD survivable satellite programs where au- 
tonomous fault tolerance are critical mission require- 
ments. We also studied a number of NASA, DoD, and 
commercial communications programs with significant 
reliability and fault tolerance requirements. 

5.3.3 Fault Tolerance Engineering Process 

The process of engineering fault tolerant systems is 
fairly well understood and a substantial body of both 


practical engineering and theoretical literature exists in- 
cluding both NASA and DoD standards [7] 1 . The most 
common thread of thought throughout the literature of 
fault tolerance is that fault tolerance cannot be achieved 
by ad hoc or piece meal methods [6]. For a system to 
be fault tolerant it must be designed from start with a 
clear process for evaluating potential failure modes and 
engineering in fault tolerance at the systems level. 

The steps of a typical fault tolerance engineering pro- 
cess as might be applied to TNIM are as follows: 

a. Identify critical subsystems their elements. 

b. Describe available technology alternatives. 

c. Describe system-level impacts of alternatives 
(weight, power, thermal, operational complexity, 
etc.). 

d. Identify and rank for criticality potential failure 
modes. 

e. Describe potential causes of failure modes. 

f. Identify system architecture alternatives. 

g. Describe fault handling scenarios for each alterna- 
tive. 

h. Describe and quantify redundancy and fault cov- 
erage characteristics for each alternative. 

i. Describe system-level impacts of subsystem fail- 
ures for each architecture alternative. 

j. Describe system-level reliability (MTBF) and re- 
pair (MTTR) distributions for alternatives. 

k. Conduct trades against impacted system parame- 
ters. 

l. Investigate potential preventative actions or new 
system architecture alternatives that minimize im- 
pacts. 

m. Select system, subsystem, and element architec- 
tures. 

n. Identify contingency provisions to deal with un- 
avoidable risks. 

o. Describe technology development needs to reduce 
risks. 

1 References are given at the end of the Chapter under “Fault Tol- 
erance Technologies”. 
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TNIM presents a substantial challenge for fault- 
tolerant engineering at all design levels. Table 5-2 
shows some of the issues associated with TNIM system- 
level communications fault tolerance. 

most critical component, the Mars Relay Satellites 
(MRS) and determine technologies which have the po- 
tential to increase reliability and fault tolerance of these 
systems. 

5 3.4 Communication Satellite Failure Modes 

The Mars Relay Satellites (MRS’s) are the heart of 
TNIM and are the single most critical portion of the net- 
work. Faults which occur on the Earth at DSN ground 
stations may be detected, locally diagnosed and cor- 
rected using techniques identical to those used for ex- 
isting missions where high levels of man-rated avail- 
ability are expected. Moreover, it is relatively simple 
to include additional redundant receivers, transmitters, 
and antennas to minimize the potential of DSN ground 
element failure. Reliability of communications equip- 
ment on-board SEI spacecraft and on the Mars surface 
provide a somewhat more difficult problem, but they 
do not represent critical bottlenecks. Moreover, fault- 
tolerant technology solutions for the MRS’s will also 
be applicable to these TNIM nodes. 

Figure 5-8 shows the typical life of a modem com- 
munications satellite and categories of failures. Inter- 
viewed operations personnel divided the life of a satel- 
lite into three major phases: 

Initial Deployment. This phase typically involves the 
first 60 days of a satellites life when it is being 
launched, inserted into geosynchronous orbit, and 
checked out. The most common failure during this 
period is launch vehicle failure which accounts for 
the loss of approximately 5% of all satellites. De- 
sign problems are typically detected and corrected 
during this phase by human intervention. For ex- 
ample, in one of the early Intelsat III launches a 
bearing seizure problem was solved by rotating the 
spacecraft so the bearings were cooled. 

Routine Operations. This phase represents the core of 
useful life of the satellite after initial system prob- 
lems have been fixed, typically for the next 5 to 
10 years. For most missions, it is characterized by 
relatively stable behavior with largely predictable 
slow degradation of non-mechanical components 
traveling wave tubes and batteries and run out of 


propulsion capabilities. Occasional failures oc- 
cur, particularly in mechanical components such 
as data recorders, momentum wheels, actuators, 
scanning mirrors, and joints. Many problems are 
fixable by planned subsystem fail-over using re- 
dundant systems. 

Extended Operations. This phase represents the end 
of the useful life of the satellite. Some redun- 
dant subsystems may no longer be available. Prob- 
lems which occur at this point are often serious 
and typically arise in one of four critical system 
areas, attitude control, communications, power, 
or data recorders. As in the Routine Operations 
phase, these failures typically result in systems 
with substantial mechanical, chemical, or high 
voltage components. Since backups may not be 
available, substantial human expertise may be nec- 
essary to develop a creative workaround. 

In many ways the operation of the MRS’s will be 
similar to that of a typical geosynchronous spacecraft 
(e. g. TDRS) since communications with Earth-based 
control systems will be continuous (although limited 
by link delay and occasional outages due to eclipses). 
Much of the body of operations and design experience 
associated with low earth orbit or deep space spacecraft 
should be sufficient to overcome these problems. The 
key challenge is to make the subsystems of the MRS 
sufficiently reliable and provide sufficient fault toler- 
ance so that mission requirements can be fulfilled with- 
out interruptions of critical functions. 

5.3.5 Critical Technologies 

Table 5-3 lists the basic subsystems of a synchronous 
communications satellite, potential sources of failure, 
and technologies which have the potential to improve 
reliability. In the following sections we describe some 
potential technology developments in the following 
critical areas: 

1. Attitude control 

2. Communications 

3. Power 

4. Data Recorders 

5. Fault Detection, Diagnosis, Management, and Re- 
pair 
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Table 5-2: System Level Fault Tolerance Issues 


Element or Link 

Issues 

Mars Relay Satellites 
(MRS) 

Maintenance of critical communications through partial or complete failure 
of on-board systems for maintaining health and safety. 

Maintenance of communications under attitude or antenna pointing system 
degradation or failure. 

Failover across MRS’s. 

Mars Communications Hub 
(MCH) 

Failure of critical switching or data buffering systems. 

Failure of receivers, transmitters, modulators, or demodulators. 

Potential bottleneck and single-point failure for Earth-Mars communications. 

Mars SEI Nodes 

Failure of receivers, transmitters, modulators, or demodulators. 
Failure of local processors and networks. 

Failure of local data buffers. 

In-Transit SEI Vehicles 

Maintenance of links during critical maneuvers such as aerobraking. 
Failure of receivers, transmitters, modulators, or demodulators. 
Maintenance of high accuracy pointing. 

Deep Space Network 
Ground Stations 

Loss of link due to weather outages. 

Mars-to-Mars Links 

Maintenance of link capability during partial failures (e. g. of an MRS). 

Mars-to-Earth Links 

Maintenance of link capability during partial failures (e. g. of an MRS). 
Maintenance of link capability through solar interference. 
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Figure 5-8: Failure Categories of Typical Communications Satellite 

















5-16 


CHAPTER 5. ASSESSMENT OF NETWORK MANAGEMENT TECHNOLOGIES 


Table 5-3: Potential Failures by Subsystem and Technology Development Needed 


Subsystem or Component 

Potential Failures Technologies for Development 

Attitude Control: 

Sensois 

Torquers 

Angular momentum storage 

Horizon sensors mech. failure 
Momentum wheel failure 
Loss of earth pointing 
Pointing noise 
Aging of electronics 

CCD/fiber optics horizon sensors. 

Low mass star sensors. 

Low mass, high reliability inertial reference units 
(e. g. fiber optic gyros, mechanical gyros). 
Magnetic torquers. 

Improved gyro configurations to support on-board 
detection of gyro failures. 

Solar sails. 

Propulsion: 

Orbit injection 
Orbit correction 
Torquer 

Propellant runout 
Propulsion control loss 
Valve failure 

Ion propulsion technologies. 

Improved efficiency chemical propulsion systems. 
Expert system-based detection and correction 
of valve closure problems. 

Electric Power 
Power source 
Storage 

Control and distribution 

Catastrophic short circuits. 
Battery failure. 

Relay failure. 

Long term radiation damage. 
Aging of electronics. 
Mechanical failure of solar 
array pointing mechanisms. 

Low mass, nuclear power sources. 

High efficiency solar cells. 

Low mass, high reliability batteries. 

Recyclable fuel cells. 

Intelligent power control systems. 

Non-mechanical relays and power switching systems. 

Thermal Control: 
Coatings 
Insulation 
Active control 

Heater malfunctions. 

Aging of coatings and blankets. 

On-board thermal modeling and protection. 
Expert system-based detection and correction. 

Structure: 

Main structure 
Deployment mechanisms 

Deployment mechanics failure. 

Composite materials. 

Zero-gravity mechanical engineering. 

Telemetry, Command, and Data 
Processing: 

On-board computers and 
network interfaces 
Encoders and muxes 
Decoders and demuxes 

Software errors and failures. 
Single bit transient errors. 
Multiple hard errors due to 
radiation bursts. 

Software-based fault tolerance. 

Improved software validation techniques. 
Redundant processor networks. 
Wafer-scale integration. 
Radiation-hardened integrated circuits. 

Communications : 
Antennas 
Switching 
Transponders 
Receivers 
Transmitters 

Antenna rotary joint failure. 
Traveling wave tube failure. 
Aging of electronics. 

Coupling and bearing technology. 
Solid state Ka-band power amplifiers. 
Optical communications. 

MMIC technology. 

Data recording and playback 

Tape recorder failure. 

Solid state recorders. 
Optical recorders. 

Launch: 

Earth to LEO transport 
LEO to Mars transport 

Launch system failure. 

High reliability launch vehicles. 
On-orbit gas and coil guns. 

Operations 

Operator error. 

Automated validation of command sequences both 
on-ground and on-board. 
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5 J.5.1 Attitude Control 

Existing three-axis stabilized spacecraft rarely need 
commanding or correction of the attitude control func- 
tions. In a sense this is the most automated set of func- 
tions in a modem satellite. For deep space spacecraft 
significant fault tolerance is an essential requirement in 
the attitude control system. For example, the Attitude 
and Articulation Control System of Galileo has signif- 
icant fault tolerant, adaptive algorithms, detects scan 
platform pointing disturbances (e. g. spacecraft mo- 
tion, actuator friction, and structural flexibility) and au- 
tonomously compensates. 

Principal failure modes involve mechanical elements 
of the attitude control system, particularly momentum 
wheel failure, gyro failure or failure of scanning mirrors 
used in earth/sun/star sensors. Potential improvements 
in gyro configurations [5] may lead to enhanced ability 
to support on-board detection or correction of failures. 
Use of fiber optics or micromechanical gyros may im- 
prove the reliability of inertial reference units and allow 
additional redundancy at no additional weight [1]. For 
example, Boeing Aerospace and Electronics has devel- 
oped a 0.5 kg breadboard containing fiber optics gyros 
and accelerometers under an Air Force contract. CCD 
or fiber optics-based sensors may remove the danger of 
failure of mirror mechanics in Mars/sun/star sensors. 

A potential additional benefit of these technologies 
may significantly reduced spacecraft mass, potentially 
allowing the incorporation of additional redundancy 
(e. g. the attitude control system is nearly 10% of the 
Intelsat V dry mass). Moreover, a ripple through ef- 
fect may result in decreased power, thermal, and fuel 
requirements, allowing further reliability to be incorpo- 
rated in the spacecraft systems. 

S.3.5.2 Communications 

Traveling Wave Tubes (TWT’s) used in spacecraft com- 
munications systems are now typically delivering over 
100,000 hours Mean Time Before Failure (MTBF) in 
space applications. They are subject to both long-term 
degradation and occasional catastrophic failure due to 
their high voltage requirements. The power required for 
Ka-band TWT’s is substantially beyond the capabilities 
of current reliable TWT technology and as such is a key 
issue in MRS fault tolerance. Moreover, operation of 
TWT’s at high powers may require additional cooling 
systems, further affecting system reliability through in- 
troduction of additional spacecraft components. There 


is some potential for the substitution of solid state power 
amplifiers as a potential measure to increase communi- 
cations reliability, but the high operating powers may 
also have some effect on amplifier life and introduce 
the need for additional cooling capabilities. 

Monolithic Microwave Integrated Circuits (MMfC) 
technology has the potential to substantially reduce the 
size, power requirements, parts count, and wiring com- 
plexity of on-board communications electronics. Each 
of these factors affects reliability. Moreover, the devel- 
opment of electronically steerable antenna arrays facil- 
itated by MMIC technology may reduce the MRS re- 
quirements for failure-prone mechanical antenna actu- 
ators, a significant issue given the size and complexity 
of potential antenna array designs. 

Optical link technologies have been advocated a way 
to increase throughput and reduce weight of MRS’s. 
However, the microradian pointing accuracies required 
for optical links introduce a potential new problem of 
reliability of fine pointing mechanisms. This may force 
the introduction of backup radio frequency links with 
additional penalties in size and complexity and poten- 
tial additional problems in reliability. Additional relia- 
bility problems may also result from high voltage cir- 
cuits required for gas or solid lasers and the relatively 
short operational life of laser diodes operating at high 
power levels. 


5.3.5.3 Power 

Power is typically considered to be the most critical of 
spacecraft systems since all other functions are depen- 
dent. Critical problems associated with power are occa- 
sional catastrophic bus failure, long-term degradation of 
battery function, or loss of pointing capabilities for so- 
lar arrays. Significant improvements in reliability may 
be feasible if we can develop low-weight nuclear power 
sources (e. g. radio-thermal generators) to reduce bat- 
tery and solar array requirements without the substan- 
tial weight penalties associated with the sources used 
in current deep space missions. Catastrophic bus fail- 
ures may be controlled to some degree by providing re- 
dundant power buses and control, but there is substan- 
tial potential for enhanced on-board power monitoring 
and control capabilities to reduce potential damage from 
electrical component failures. 
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5 3.5.4 Data Recorders 

On-board data recorders are notoriously unreliable and 
difficult to manage. Any buffering at the anticipated 
data rates that is done on MRS 's will provide significant 
operational and reliability problems. Potential tech- 
nology developments that may improve reliability of 
on-board data recording systems include the develop- 
ment of massive space-qualified semiconductor mem- 
ories such as those being developed by Fairchild or 
space-qualified optical disks such as those being devel- 
oped by GE under contract to Langley Research Center. 

Since optical disks require some mechanical parts 
and precise positioning mechanisms they may be an 
inherently less fault tolerant technology than massive 
semiconductor memories, but their substantially higher 
data densities (by several orders of magnitude) may 
be necessary to fill TNIM requirements for buffering 
multi-megabit per second data streams. 

S.3.5.5 Fault Detection, Diagnosis, Management, 
and Repair 

Given the 40 minute potential delay in Earth-Mars com- 
munication, an MRS should provide internal mecha- 
nisms to detect and correct internal faults. Some ef- 
fort has been performed in this area through efforts such 
as satellite autonomy research funded by the Air Force 
Rome Air Development Center or by JPL. Much of the 
work in general large-scale fault detection and correc- 
tion has been superseded by relatively tightly focused 
efforts in improving autonomy of specific subsystems. 
Probably the most successful instance of this sort of ca- 
pability are the on-board mission sequencing and fault 
detection, isolation, and correction capabilities in deep 
space efforts like the Galileo spacecraft. 

JPL has defined a number of levels of sophistication 
associated with control of fault recovery for spacecraft 
systems: 

Level 0: Passive control with no active functions (e. g. 
gravity gradient). 

Level 1: Minimum active control, single string, no re- 
dundant functions. 

Level 2: Redundancy and cross strapping, ground 
commanded switching. 

Level 3: Internal sensing of prespecified mission- 
critical faults with self preserving switching. In- 


cludes use of dead-man timer, automatic safe-hold 
entry. 

Level 4: Self-checking of hardware commands to 
avoid internal faults. Ground interaction required 
for fault recovery. 

Level 5: Autonomously fault-tolerant to faults defined 
a priori. Senses faults, performs management or 
switching (e. g. switch to redundant IR detectors 
in an Earth sensor). Has contingency programs on- 
board. Specified operation not degraded by single 
faults. 

Most subsystems for current communications satel- 
lites tend to be at Levels 2 or 3. Ideally, the most critical 
subsystem functions for MRS’s should be at Level 5. In 
principal there are no major technical constraints associ- 
ated with developing this level of autonomy (assuming 
some minor improvements in space qualified computers 
and spacecraft control software). 

A key issue is coverage, i. e. the extent to which 
faults can be detected, diagnosed, and corrected before 
serious damage is done. Coverage can be improved by 
a variety of strategies: 

Passive techniques: making sure that critical param- 
eters (e. g. temperatures) change slowly, so that 
there is sufficient time for earth-based or on-board 
systems to detect and correct faults. 

Hierarchical fault detection and management schemes: 
providing a hierarchy of fault detection and diag- 
nostic schemes (e. g. expert system-based) to fix 
faults. 

Heartbeat signals: Regularly required “I am well” 
messages from subsystems monitored by a cen- 
tralized or distributed health and safety subsystem 
(e. g. as used in Voyager). 

Voting techniques: Multiple control systems gov- 
erned by voting logic (e. g. such as used in the 
Apollo lander or Shuttle avionics or Stratus com- 
puters). 

Model-based techniques: Maintenance of computer 
models and comparator logic to test whether the 
real subsystem is behaving in a manner similar to 
the model. 



5.4. REFERENCES 


5-19 


5.4 References 

Network Management Technologies (f 5.2) 

1. Joseph , C, A., “Integrated Network Management 
in an Enterprise Environment”, IEEE Network 
Magazine, July 1990, pp. 7-13. 

2. Macleish, K. J., Thiedke, S. and Vennergrund, D., 
“Expert Systems in Office Switch Maintenance”, 
IEEE Communications Magazine, Sep. 1986, 
Vol. 24, No. 9, pp. 26-33. 

3. Halsall, F. and Modiri, N., “An Implementation of 
an OSI Network Management System”, IEEE Net- 
work Magazine, July 1990, pp. 44-54. 

4. Huhns, Michael, N., “Distributed Artificial Intelli- 
gence”, Research Notes in Artificial Intelligence, 
Morgan Kaufmann Publishers, Inc., Los Altos, 
CA, 1987. 

5. Gasser, Les, “Distributed Artificial Intelligence”, 
AI Expert Magazine, July 1989, pp. 26-33. 

6. Bowers, A. W. and Connell, E. B„ “A Checklist 
of Communications Protocol Functions Organized 
Using the Open System Interconnection Seven- 
Layer Reference Model”, IEEE Proceedings of 
COMPCON, 1983, pp. 479-487. 

7. Lesnansky, E. and W. Turner, “High Volume 
Data Handling and Distribution in the EOS Era”, 
AIAA/NASA Second International Symposium 
on Space Information Systems, 17-19 September, 
1990, Pasadena, CA. 

8. Atkinson, D. and James, M„ “Applications of AI 
for Automated Monitoring: the SHARP System”, 
AIAA/NASA Second International Symposium 
on Space Information Systems, 17-19 September, 
1990, Pasadena, CA. 

9. Varsi, G., Man, G. and Rodriquez, G., “Automa- 
tion of Planetary Spacecraft”, Unmanned Systems 
Magazine, Fall 1986, pp. 21-31. 

10. Dupuy, A., Schwartz, J., Yemini, Y. and Bacon, D., 
“NEST: A Network Simulation and Prototyping 
Testbed”, Communications of the ACM, Vol. 33, 
No. 10, October, 1990, pp. 63-74. 


Fault Tolerance Technologies (f 5.3) 

1. J. Deyst, J. El well, and E. Womble, “A Revolu- 
tion in Advanced Guidance Systems is Coming”, 
Aerospace America, October 1990. 

2. G. Gilley, “Architectural Design Methods of 
Transient Fault Protection”, AIAA Computers in 
Aerospace VI Conference, Wakefield, Massachu- 
setts, October 7-9, 1987. 

3. P. S. Goel et al., “Autonomous Safe Mode Opera- 
tions of the Attitude Control System of Indian (IRS 
and INSAT-II) Satellites to Prevent Catastrophe”, 
Workshop in Spacecraft Autonomy: Present and 
Future Capabilities, Pasadena, CA, September 13- 
15, 1988. 

4. J. Gray, “Why Do Computers Stop and What Can 
Be Done About It”, IEEE Fifth Symposium on 
Reliability in Distributed Software and Database 
Systems, Los Angeles, California, January 13-15, 
1986. 

5. S. Murugesan and P. S. Goel, “Autonomous 
Reconfiguration of Spacecraft Attitude Refer- 
ence System Using DTGs”, Workshop in Space- 
craft Autonomy: Present and Future Capabilities, 
Pasadena, California, September 13-15, 1988. 

6. Charles E. Roth, Jr. et al., “Reliability in Space Ve- 
hicles”, Engineering Publishers, Elizabeth, New 
Jersey, 1965. 

7. Philip Turner, “Autonomous Spacecraft Design 
and Validation Methodology Handbook”, Issue 2, 
7030-4, Jet Propulsion Laboratory, Pasadena Cal- 
ifornia, August 1984. 

8. G. Varsi, G. Man, and G. Rodriquez, “Automation 
of Planetary Spacecraft”, Unmanned Systems, Fall 
1986. 


This is page intentionally left blank. 



Chapter 6 

Technology Development Plan 


In this chapter, we present a plan for developing 
the necessary technologies for unattended telecommu- 
nications network management Such a program must 
be multifaceted, emphasizing development of more ro- 
bust underlying technologies for satellite telecommu- 
nications, including hardware, communications proto- 
cols, and control software. Some areas such as the de- 
velopment of more fault tolerant communications satel- 
lite technology have a significant overlap with other ar- 
eas of SEI, since a comprehensive approach to fault tol- 
erance is critical for nearly every SEI function. 

We recommend the development of a a testbed for 
using new technologies and innovative use of existing 
technologies to determine optimum performance ap- 
proaches for implementing Mars unattended communi- 
cations networks and the feasibility of applying these 
technologies to the Mars mission requirements. We di- 
vide the development of this testbed into the following 
stages: 

Initial simulation and modeling: a 

comprehensive program of simulation and model- 
ing designed to provide an early understanding of 
the performance characteristics of proposed TNIM 
network management strategies and the impact of 
these strategies on SEI operations. 

Earth-based networking: tests of TNIM network 
management concepts using existing Earth-based 
networks, existing communications satellites, and 
near-Earth resources such as ACTS. 

Lunar operations: tests of TNIM network manage- 
ment concepts during the lunar exploration phase 
of SEI. 

Preliminary Mars-based operations: early tests of 
TNIM network management concepts using early 
Mars-based resources. 

6 


This chapter is organized as follows: 

6. 1 Testbed for Managing Network Resources 

6.2 Basis of TNIM Network Management 
Testbed Approach 

6.3 Objectives of the Testbed 

6.4 Distributed Artificial Intelligence 

6.5 Schedule and Resources 

6.1 Testbed for Managing Network Re- 
sources 

Earth-Mars communications delays, a major design 
driver for TNIM network management, are significandy 
less for near-Earth and Earth-Moon links. As a re- 
sult we may be able to initially learn more from sim- 
ulation and modeling about the effectiveness of TNIM 
network management strategies than from early space- 
based tests. With an appropriately constructed model- 
ing environment, it would be easy to introduce factors 
which simulate time delays, geometric blockages based 
on orbit behavior, and deep space noise sources, and it- 
erate to an optimized network management architecture 
at a relatively low cost. Additional operational experi- 
ence in a space environment would then serve to further 
validate the models and architecture. 

For this reason we have focused on proposing a simu- 
lation and modeling testbed which can provide an early 
understanding of TNIM network management issues 
and provide a basis for further planning of space ex- 
periments to validate these concepts. The principal ob- 
jective of the testbed would be to determine what ap- 
proaches and what distribution of management can best 
support unattended network management in the context 
of TNIM. 

1 
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6.2 Basis of TNIM Network Manage- 
ment Testbed Approach 

At the present, local area and wide area networks con- 
sist of multi-vendor environments. The use of proto- 
cols have been widely accepted as an effective means 
for managing these networks. These protocols include: 

• Common Management Information Protocol 
(CMIP) 

• Common Management Protocol over TCP/IP 
(CMOT) 

• Simple Network Management Protocol (SNMP) 

Although CMIP is regarded as the future of network 
management, it is still being developed and no off-the- 
shelf tools or networks have CMIP capabilities. The 
most widely accepted and used protocol is the SNMP. 

The SNMP has a design framework which lends it- 
self to implementing a distributed network management 
system based on the concepts described in Chapter 5 of 
this report. There are two components to the implemen- 
tation of SNMP: 

• Agent 

• Network Management Station 

The agent is software found on a variety of network el- 
ements (bridges, routers, file servers, etc.). The agent 
collects network statistics for the element on which it 
resides. The agent forwards the information when re- 
quested by a network management station or when an 
event occurs. Other network management tasks include 
the status of various network elements, reviewing er- 
ror situations and dynamically rerouting network traffic 
around network nodes which are heavily loaded. 

We propose that NASA fund an effort to create an 
integrated testbed which would initially utilize SNMP 
concepts to model agent and network components of a 
series of TNIM architectures. Initial modeling would 
take place within a single workstation with software 
processes designed to simulate TNIM nodes and links. 
Such a simulation would be highly parameterized and 
include specific software elements to model communi- 
cations delays and outages based on orbit geometry and 
known models of deep space communications interfer- 
ence. Specific faults and communications loads could 
be introduced into the model and the performance of 


the model under various conditions could be monitored 
and analyzed. 

Initial strategy would be to purchase an off-the-shelf 
network management package based on SNMP and in- 
sert software elements which would support detailed 
modeling of TNIM nodes, links, and management func- 
tions. Over a period of several years the approach would 
be expanded to include management of selected TNIM 
network functions emulated by Loral or NASA com- 
puters and links in a geographically dispersed network 
using either domestic satellite links or NASA-provided 
facilities such as ACTS. 

6.3 Objectives of the Testbed 

The proposed test bed would have the following objec- 
tives: 

• Identify candidate TNIM distributed network man- 
agement architectures and provide preliminary es- 
timates of network performance under anticipated 
communications scenarios. 

• Test the performance of routing protocols and their 
ability to provide connectivity and dynamic re- 
sponse to network loading. 

• Identify and test strategies for routing network traf- 
fic under both routine link interruptions and equip- 
ment failure. 

• Support an iterative process of discovery of new 
TNIM network management requirements through 
experiment, observation, and analysis. 

6.4 Distributed Artificial Intelligence 

As described in the previous sections, Distributed Arti- 
ficial Intelligence (DAI) deals with the cooperative so- 
lution of problems by a decentralized group of agents 
(i. e., software programs on distributed nodes). The 
group of agents is decentralized so that both control and 
data are often logically or physically distributed. Since 
there are several architectures and paradigms for imple- 
menting cooperative problem solving in a distributed 
environment such as a distributed communication net- 
work, the proposed testbed would serve to support the 
investigation of the effectiveness of tools for DAI in 
unattended communications network applications. 

Specifically, we would evaluate the following: 
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Table 6-1: Schedule for Unattended Network Management Testbed 


Months 

after 

Start 

Development Step 

1-3 

Develop testbed requirements and procure equipment and software. 

4-7 

Develop alternative TNIM architectures and configure SNMP and simulation software. 

8 

Develop scenarios for testing of TNIM architectures. 

9-12 

Conduct testing and prepare report on results. 

13-16 

Develop approaches for distributed intelligent network management. 

17-20 

Develop and embed distributed agents in SNMP software. 

21-24 

Test system and report results. 


ABE is a framework for integrating a number of het- 
erogeneous, independently developed, problem- 
solving paradigms and software tools; 

AGORA is a layered environment which supports 
the design and development of large evolutionary 
problem solving systems; 

MACE (Multi-Agent Computing Environment) is a 
testbed for building experimental DAI systems at 
different levels of abstraction. Just as in SNMP, 
MACE computational units are called agents, with 
parallel executions. MACE has the added capabil- 
ity of communicating via messages which makes 
the system more modular. 

AF (Activation Framework) supports the implementa- 
tion of real-time artificial intelligence programs on 
multiple interconnected computers. It is based on 
a model of a community of experts communicating 
by passing messages among one another. 


6.5 Schedule and Resources 

In this section we describe a limited plan for building 
an initial testbed for the development of TNIM unat- 
tended network management concepts. Preliminary de- 
velopment could be performed according to the two- 
year schedule given in Table 6-1 and illustrated in Fig- 
ure 6-1. 

The cost of developing the proposed testbed would 
be approximately the following in 1991 dollars: 


SNMP software $15,000 

Modeling and simulation software $18,000 

X-windows, Unix and C software $3,000 

Sun SPARC workstation $13,000 

DAI tools $10,000 

2 full time equivalents, 2 years $416,000 

6 round trips to Cleveland, Ohio $6,000 

Estimated total cost $481 ,000 


The products of the proposed testbed efforts would 
be as follows: 

• Series of reports describing model design, network 
management architecture alternatives, and experi- 
ment results. 

• Multiple models of TNIM network management 
architectures including documentation and analy- 
ses. 

• Quarterly reports and design review presentation 
materials. 

Results from this testbed would serve as the basis of 
further development of space-based tests of TNIM net- 
work management concepts. 









6-4 


CHAPTER 6. TECHNOLOGY DEVELOPMENT PLAN 



1991 

1992 

J F MA MJ J A S ON D 

J FMAMJ J AS OND 

Develop testbed requirements, 
procure hardware and software. 

■ 

■ 

■ 






1 

1 

1 



1 

1 













Develop alternate TNIM architectures, 
configure SNMP and simulation software. 
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Develop scenarios for testing of TNIM 
architectures. 
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intelligent network management. 
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Develop and embed distributed agents 
in SNMP software. 
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Figure 6-1: Two-Year Schedule for Testbed Development for Mars TNIM Unattended Network 
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